[ome-devel] Fwd: File format for large data sets stored at multiple resolutions

Jason Swedlow (Staff) j.r.swedlow at dundee.ac.uk
Tue May 12 09:17:19 BST 2015


Hi Nico, Johan et al-

Apologies for the delay in responding.  Nico’s email started a lot of discussion in the OME team.  I saw Nico and Curtis Rueden a couple of days after this was sent at AQLM (http://www.mbl.edu/education/special-topics-courses/analytical-quantitative-light-microscopy/), and we had a long discussion about this issue over chips, salsa, and PSFs.

As many of you know, this has been a hot topic of late.  Several new and some established technologies (LSFM, DigPath, HCS and others) are now routinely generating heterogeneous (binary pixel data, metadata, and analytics) multi-dimensional, multi-TB datasets.  Within OME, we’ve been discussing how we approach this trend— whether we amend OME-TIFF, define a new format (mindful of a lot of work by others, in particular Cellh5 (http://www.cellh5.org/), BDV (http://fiji.sc/BigDataViewer), OpenSlide (http://openslide.org/) and others), or just wait for someone else to generate yet another file format (YAFF®) or in all likelihood several new file formats (SNFFs®) and doggedly support them all in Bio-Formats.  Johan’s proposed pointer-based solution is, if I have things correct, already implemented in Micro-Manager’s OME-TIFF, for exactly the reason described (apologies to Nico and the MM team if I have this incorrect— please do provide the accurate description of what is going on there if I have failed).

AFAICT, there is a rather old, well-worn debate between the filesystem and database camps on this issue.  To my mind, Jim Gray and colleagues captured this tension most clearly and accurately in 2005 (http://research.microsoft.com/pubs/64537/tr-2005-10.pdf).  It’s almost certainly true that both approaches will evolve side by side, and we (where we = the community) should try to develop both solutions— they each have their place and utility.  OME’s version of this is to develop a data spec (e.g., OME-TIFF), an I/O  library (Bio-Formats), and applications that use the format (OMERO).  We are committed to this stratgey  for any spec we develop.  That might explain, but not excuse, our rather slow approach on this issue.  We insist on the spec and its many implementations.

Mindful of all this, OME's priority is building tools that are as useful, generic and performant as possible.  In this case, that means developing a format that works in many different domains and that includes support for multi-res pixel sets, acquisition metadata, ROIs, trajectories, etc.  In discussing this with Nico and Curtis, we agreed the obvious— anything we build has to be done in steps.  The over-riding immediate problem that several people face is support for large, multi-res, multi-D pixel sets.  The existing (partial) solutions are worth considering, but anything we do must also support both Java and native environments— OME is committed to at least bypassing, if not removing, the barriers between the Java and native worlds, where we have the resources to do so.

We’ve added this topic to the workshops at our upcoming Users meeting and welcome input there (https://www.openmicroscopy.org/site/community/minutes/meetings/10th-annual-users-meeting-june-2015).  It looks like we will have a very strong turnout for the meeting.  We’d encourage anyone interested to join us there, but obviously also welcome input on this list, Forums, etc.  We’ll report back with our plan for addressing this very important point.

As always, thanks for your support.

Cheers,

Jason

--------------------
Centre for Gene Regulation & Expression | Open Microscopy Environment | University of Dundee

Phone:  +44 (0) 1382 385819
email: j.swedlow at dundee.ac.uk<mailto:j.swedlow at dundee.ac.uk>

Web: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
Open Microscopy Environment: http://openmicroscopy.org<http://openmicroscopy.org/>




From: Johan Henriksson <mahogny at areta.org<mailto:mahogny at areta.org>>
Reply-To: OME Development <ome-devel at lists.openmicroscopy.org.uk<mailto:ome-devel at lists.openmicroscopy.org.uk>>
Date: Wednesday, 6 May 2015 22:45
To: OME Development <ome-devel at lists.openmicroscopy.org.uk<mailto:ome-devel at lists.openmicroscopy.org.uk>>
Subject: Re: [ome-devel] Fwd: File format for large data sets stored at multiple resolutions

Hi Nico!

First of all, have you tried the pyramid compression of jpeg2000? (I have not!)

Second, last time I tried large datasets in ome-tiff it was a huge issue. I tried to convert our 40gb+ recordings to ome-tiff and got indexing times from hell (up to 10 minutes). I never had time to properly investigate this. Part of the problem might be that I changed the OME-writer to add in JPEG-compressed data (since we have a lot of jpegs since before, and I did not feel like converting those to PNGs).

JPEG2000 pyramids would help you with huge 2D-images but not with huge 5D datasets. I believe the problem (anyone please correct me here) is that the ome-reader first indexes the TIFF-file - but does so in worst case by going through the entire file to find where each plane is(?). This is in no way fast in the current implementation as I suspect it jumps through the entire dataset as tifs are essentially a linked list of planes. if your output compressed file is 5gb+ then this alone is really slow

my solution to this would have been an extension with a special data object containing pointers to most of all planes, in a single place. thus very few reads would be needed to map all planes. but then I moved to another lab and never had time to return to this. but I think it's a problem/solution worth reconsidering. if a tif-reader does not understand such a special data object it would just ignore it, but specialized readers could gain a lot of speed by reading it

cheers,
Johan







On Mon, May 4, 2015 at 2:08 AM, Nico Stuurman <nico.stuurman at ucsf.edu<mailto:nico.stuurman at ucsf.edu>> wrote:

Dear all,

I have been running into more and more individual efforts to create new
file formats to deal with large datasets that need to be stored at
multiple resolutions to enable fast feedback to the user. Examples are
the hdf5 format used by the BigDataViewer plugin by Tobias Pietzsch and
Stephan Preibisch, the hdf5 format used by Chimera (UCSF-based package
primarily for crystallography and EM that also has amazing capabilities
for 3D visualization of light microscopy data), the Micro-Manager
SlideExplorer plugin for which Arthur Edelstein developed his own
storage system, and the Micro-Manager plugin "Magelan" that Henry
Pinkard is developing right now, and who also stores multiple resolution
versions of the data on disk. Doubtlessly, there are many more examples.

Even when conversion between these formats is possible (as long as they
are reasonably documented), conversion becomes time consuming and takes
up large amounts of disk space, simply because the data sets have become
gigantic.  The reasons why everyone designs their own formats are also
clear, there simply is no standard (at least that I am aware of, if
there is please do let me know! ) that let's one store gigantic datasets
that give fast access to the data in multiple resolutions.

Since you guys have created the standard in light microscopy with
ome.tif, I assume that you have thoughts what a new standard (hdf5
based?) should look like.  In any case, I am very much looking forward
to hearing your thoughts and I will be happy to help avoid a wild growth
of different formats that we will have to live with for years to come if
we do not take action soon.

Best,

Nico




_______________________________________________
ome-devel mailing list
ome-devel at lists.openmicroscopy.org.uk<mailto:ome-devel at lists.openmicroscopy.org.uk>
http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel



--
--
-----------------------------------------------------------
Johan Henriksson, PhD
Karolinska Institutet / European Bioinformatics Institute (EMBL-EBI)
Labstory - Integrated laboratory documentation and databases (www.labstory.se<http://www.labstory.se>)
http://mahogny.areta.org  http://www.endrov.net

<http://www.endrov.net>

The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20150512/0d6c8c73/attachment-0001.html>


More information about the ome-devel mailing list