[ome-users] HCS File Format / HDF5

Sebastien Besson (Staff) s.besson at dundee.ac.uk
Wed Oct 17 09:22:45 BST 2018


Hi Manuel and Mario,

From the OME side, here are a couple of pointers on open HCS/HDF5-based formats and their compatibility.

The open CellH5 format developed by EMBL [1] is currently supported by Bio-Formats [2]. Data stored in this
form can be read by our APIs and managed in an OMERO server. Bio-Formats also has support for writing
CellH5 files as a writer which was contributed by the developers of the format alongside the reader.

Re OME and HDF5, since the beginning of the year we have been modernizing our open file formats in order
to offer support for modalities like multi-resolution images and add support for new performant binary data
containers. This is part of a larger strategy that OME has embarked on where we explicitly work to support a
range of binary containers, in order to address the issues created by the growing number of imaging modalities
that generate large, complex multi-dimensional data. For example, we are currently extending our OME-TIFF
specification to support pyramidal levels and will soon release open readers and writers for these formats. This
work targets the rapidly emerging whole slide imaging and digital pathology community.

We have also explored adding support  for binary vessels like HDF5, KLB and others. Part of this work was
presented during the 2018 OME Users meeting [3]. This work results from an intersection of our desire to expand
support for binary vessels and the requirement to support datasets that will soon be published in IDR [4].  HDF5
has several advantages (Mario has discussed these), but a full implementation requires the development of yet
another file format (YAFF), something that is anathema to OME’s philosophy. There are already useful
implementations of HDF5, e.g., BigDataViewer [5] that we aim to support in Bio-Formats.  Regardless, writing
an HDF5-based format requires substantial work. A full technical proposal [5] details the series of changes that
need to happen at the model and API level to make this happen. Unlike the OME-TIFF pyramidal work, full
support for an OME-HDF5 or any other new binary vessel will necessitate breaking changes across our entire
software including a new OME Data Model. We will keep reporting on the progress via blog posts as well as
during our next Users Meeting.

Best,
Sebastien, Jason & The OME Team


[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673213/
[2] https://docs.openmicroscopy.org/bio-formats/5.9.2/formats/cellh5.html
[3] https://downloads.openmicroscopy.org/presentations/2018/Users-Meeting/Workshops/NewFileFormats/BinaryVessels/
[4] https://www.cell.com/cell/pdf/S0092-8674(18)31243-1.pdf
[5] https://github.com/bigdataviewer
[6] https://docs.google.com/document/d/1vCtIrxtlbA-SWPJVzwHOSnbfkbGKJAbijhvhvlSLMEw/edit#heading=h.r20td2bzgkgw

> On 11 Oct 2018, at 15:45, Manuel Stritt <manuel.stritt at idorsia.com> wrote:
>
>
> Dear Mario,
>
> thanks a lot for the detailed explanation with concrete measurements, very valuable!
>
> So did you / someone try CellH5 or did you use your own HDF5 structure ?
>
> Any plans and/or recommendations from OME side for bioformats/Omero integration?
>
> Regards,
> Manuel
>
>
> Dear Manuel,
>
> On 10.10.2018 08:46, Manuel Stritt wrote:
> > Dear all,
> >
> > we're currently rethinking our high content screening workflow and thus I'm thinking about a good way to store all data.
> > My idea is to create one e.g. HDF5 file per plate which contains all the images + meta data.
> > Do  have any recommendations regarding that (I think Mario once triggered a discussion around that topic) ?
> >
> > If some kind of HDF5 is considered as solution - then still a structure specification would be needed.
> > Kai and Nico mentioned the CellH5 format, a flavor of HDF5.
> > As far as I can see this is supported by bioformats / Omero. However, it's still unclear how
> > to pack the output of a e.g. Opera machine into a CellH5 format in a convenient way.
> >
> > In addition there was a discussion about an official OME-HDF5 format, right?
> > What's the current status for that?
>
> In our testing, the containers provide a big benefit for the file
> system and storage back end, especially when a large NAS storage is
> used. For a typical desktop application with a local spinning disk,
> there was virtually no difference in speed or file system overhead.
>
> On a NAS, we got a 30% eduction in storage space due to reduced chunk
> size overhead. This is a tunable parameter, so your mileage may vary!
> It may be anything between 0% and up to 50% (or more) reduction,
> depending on your image file size and the file system chunk size.
>
> Furthermore we got significantly faster data transfer rates because
> the data is more consecutive. An rsync transfer of a full plate from/to
> the NAS was about twice as fast when transferring the container than
> the individual files.
>
> Last not least the container can apply transparent compression without
> modifying the original file, so you can (transparently) apply bzip2 or
> other compression schemes on the raw file, while keeping the full
> (proprietary) file intact and unchanged.
>
>
> All this comes also at a price. There where situations where we would
> have liked to access images and it was not as easy as we hoped. Its
> good if the database supports download of individual files, to make
> the image access transparent for end users. But when the database is
> down, image access becomes quite hard for end users. Furthermore, there
> is a unlucky number of ~100-1000 files where access is always a hazzle,
> because manual download becomes too cumbersome and container extraction
> is typically not super comfortable.
>
> Long story short: containers are great, but I would really love to
> see them in combination with a simple, graphical, cross-platform file
> management utility that allows adding and extracting files. There are
> some such utilities like HDF5View [1] but they are not yet comparable
> to something like WinZip/WinRAR/7Zip/...
>
> [1] https://support.hdfgroup.org/HDF5/Tutor/hdfview.html
>
> Viele Gruesse,
>
>     Mario Emmenlauer
>
>
> --
> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
> D-81669 München                          http://www.biodataanalysis.de/
>
>
>
>
>
> The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged.
> It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email.
> The content of this email is not legally binding unless confirmed by letter.
> Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of the sender's company.
> _______________________________________________
> ome-users mailing list
> ome-users at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users


The University of Dundee is a registered Scottish Charity, No: SC015096


More information about the ome-users mailing list