[ome-devel] Fwd: File format for large data sets stored at multiple resolutions

Tue May 12 13:44:55 BST 2015

Chipping in my 2 cents since I will not be there. Our next release supports
CellH5 and we have had very good luck with HDF5 in general. Pyramidal isn't
quite free, but easily realizable through slicing and perhaps CellH5 can
evolve to include pyramidal levels. Regarding database / file-format, I
have been considering how to make that debate a little more agnostic,
especially as we enter into the web-services era. It would be really useful
for us to have wire format standards for our community's data and possibly
some standardization of web interfaces - we're going to be moving
CellProfiler in that direction in the next couple of years.

Some more notes here:
https://github.com/imagej/imagej-server/issues/1

--Lee

On Tue, May 12, 2015 at 4:17 AM, Jason Swedlow (Staff) <
j.r.swedlow at dundee.ac.uk> wrote:

>  Hi Nico, Johan et al-
>
>  Apologies for the delay in responding.  Nico’s email started a lot of
> discussion in the OME team.  I saw Nico and Curtis Rueden a couple of days
> after this was sent at AQLM (
> http://www.mbl.edu/education/special-topics-courses/analytical-quantitative-light-microscopy/),
> and we had a long discussion about this issue over chips, salsa, and PSFs.
>
>  As many of you know, this has been a hot topic of late.  Several new and
> some established technologies (LSFM, DigPath, HCS and others) are now
> routinely generating heterogeneous (binary pixel data, metadata, and
> analytics) multi-dimensional, multi-TB datasets.  Within OME, we’ve been
> discussing how we approach this trend— whether we amend OME-TIFF, define a
> new format (mindful of a lot of work by others, in particular Cellh5 (
> http://www.cellh5.org/), BDV (http://fiji.sc/BigDataViewer), OpenSlide (
> http://openslide.org/) and others), or just wait for someone else to
> generate yet another file format (YAFF®) or in all likelihood several new
> file formats (SNFFs®) and doggedly support them all in Bio-Formats.
>  Johan’s proposed pointer-based solution is, if I have things correct,
> already implemented in Micro-Manager’s OME-TIFF, for exactly the reason
> described (apologies to Nico and the MM team if I have this incorrect—
> please do provide the accurate description of what is going on there if I
> have failed).
>
>  AFAICT, there is a rather old, well-worn debate between the filesystem
> and database camps on this issue.  To my mind, Jim Gray and colleagues
> captured this tension most clearly and accurately in 2005 (
> http://research.microsoft.com/pubs/64537/tr-2005-10.pdf).  It’s almost
> certainly true that both approaches will evolve side by side, and we (where
> we = the community) should try to develop both solutions— they each have
> their place and utility.  OME’s version of this is to develop a data spec
> (e.g., OME-TIFF), an I/O  library (Bio-Formats), and applications that use
> the format (OMERO).  We are committed to this stratgey  for any spec we
> develop.  That might explain, but not excuse, our rather slow approach on
> this issue.  We insist on the spec *and* its many implementations.
>
>  Mindful of all this, OME's priority is building tools that are as
> useful, generic and performant as possible.  In this case, that means
> developing a format that works in many different domains and that includes
> support for multi-res pixel sets, acquisition metadata, ROIs, trajectories,
> etc.  In discussing this with Nico and Curtis, we agreed the obvious—
> anything we build has to be done in steps.  The over-riding immediate
> problem that several people face is support for large, multi-res, multi-D
> pixel sets.  The existing (partial) solutions are worth considering, but
> anything we do must also support both Java and native environments— OME is
> committed to at least bypassing, if not removing, the barriers between the
> Java and native worlds, where we have the resources to do so.
>
>  We’ve added this topic to the workshops at our upcoming Users meeting
> and welcome input there (
> https://www.openmicroscopy.org/site/community/minutes/meetings/10th-annual-users-meeting-june-2015).
>  It looks like we will have a very strong turnout for the meeting.  We’d
> encourage anyone interested to join us there, but obviously also welcome
> input on this list, Forums, etc.  We’ll report back with our plan for
> addressing this very important point.
>
>  As always, thanks for your support.
>
>  Cheers,
>
>  Jason
>
>   --------------------
>
> Centre for Gene Regulation & Expression | Open Microscopy Environment
> | University of Dundee
>
>
>
> Phone:  +44 (0) 1382 385819
>
> email: j.swedlow at dundee.ac.uk
>
>
>
> Web: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
>
> Open Microscopy Environment: http://openmicroscopy.org
>
>
>
>
>   From: Johan Henriksson <mahogny at areta.org>
> Reply-To: OME Development <ome-devel at lists.openmicroscopy.org.uk>
> Date: Wednesday, 6 May 2015 22:45
> To: OME Development <ome-devel at lists.openmicroscopy.org.uk>
> Subject: Re: [ome-devel] Fwd: File format for large data sets stored at
> multiple resolutions
>
>      Hi Nico!
>
>  First of all, have you tried the pyramid compression of jpeg2000? (I have
> not!)
>
>  Second, last time I tried large datasets in ome-tiff it was a huge issue.
> I tried to convert our 40gb+ recordings to ome-tiff and got indexing times
> from hell (up to 10 minutes). I never had time to properly investigate
> this. Part of the problem might be that I changed the OME-writer to add in
> JPEG-compressed data (since we have a lot of jpegs since before, and I did
> not feel like converting those to PNGs).
>
>  JPEG2000 pyramids would help you with huge 2D-images but not with huge 5D
> datasets. I believe the problem (anyone please correct me here) is that the
> ome-reader first indexes the TIFF-file - but does so in worst case by going
> through the entire file to find where each plane is(?). This is in no way
> fast in the current implementation as I suspect it jumps through the entire
> dataset as tifs are essentially a linked list of planes. if your output
> compressed file is 5gb+ then this alone is really slow
>
>  my solution to this would have been an extension with a special data
> object containing pointers to most of all planes, in a single place. thus
> very few reads would be needed to map all planes. but then I moved to
> another lab and never had time to return to this. but I think it's a
> problem/solution worth reconsidering. if a tif-reader does not understand
> such a special data object it would just ignore it, but specialized readers
> could gain a lot of speed by reading it
>
>  cheers,
>  Johan
>
>
>
>
>
>
>
> On Mon, May 4, 2015 at 2:08 AM, Nico Stuurman <nico.stuurman at ucsf.edu>
> wrote:
>
>>
>> Dear all,
>>
>> I have been running into more and more individual efforts to create new
>> file formats to deal with large datasets that need to be stored at
>> multiple resolutions to enable fast feedback to the user. Examples are
>> the hdf5 format used by the BigDataViewer plugin by Tobias Pietzsch and
>> Stephan Preibisch, the hdf5 format used by Chimera (UCSF-based package
>> primarily for crystallography and EM that also has amazing capabilities
>> for 3D visualization of light microscopy data), the Micro-Manager
>> SlideExplorer plugin for which Arthur Edelstein developed his own
>> storage system, and the Micro-Manager plugin "Magelan" that Henry
>> Pinkard is developing right now, and who also stores multiple resolution
>> versions of the data on disk. Doubtlessly, there are many more examples.
>>
>> Even when conversion between these formats is possible (as long as they
>> are reasonably documented), conversion becomes time consuming and takes
>> up large amounts of disk space, simply because the data sets have become
>> gigantic.  The reasons why everyone designs their own formats are also
>> clear, there simply is no standard (at least that I am aware of, if
>> there is please do let me know! ) that let's one store gigantic datasets
>> that give fast access to the data in multiple resolutions.
>>
>> Since you guys have created the standard in light microscopy with
>> ome.tif, I assume that you have thoughts what a new standard (hdf5
>> based?) should look like.  In any case, I am very much looking forward
>> to hearing your thoughts and I will be happy to help avoid a wild growth
>> of different formats that we will have to live with for years to come if
>> we do not take action soon.
>>
>> Best,
>>
>> Nico
>>
>>
>>
>>
>> _______________________________________________
>> ome-devel mailing list
>> ome-devel at lists.openmicroscopy.org.uk
>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>
>
>
> --
>  --
> -----------------------------------------------------------
> Johan Henriksson, PhD
> Karolinska Institutet / European Bioinformatics Institute (EMBL-EBI)
> Labstory - Integrated laboratory documentation and databases (
> www.labstory.se)
> http://mahogny.areta.org  http://www.endrov.net
>
>  <http://www.endrov.net>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20150512/9398a14e/attachment.html>