[ome-users] File format for HTS/HCS data

Sebastien Besson (Staff) s.besson at dundee.ac.uk
Mon Oct 17 11:56:17 BST 2016


Hi Kai, Mario, Christian and others,

As of 2016, the only native OME file format available for storing HCS data remains OME-TIFF. In order to reduce the number of TIFF files, it is possible to either use the BigTIFF extension [1] or use multi-image TIFFs for instance by storing one TIFF per will containing 5D images. Interestingly, this is an approach undertaken by some manufacturers like Olympus ScanR [2].

The usage of HDF file formats has certainly been raised both in the public mailing lists [3,4] and several conferences and at recent OME Users Meetings. This topic is still under investigation on our side and no concrete roadmap has been assigned.

Based on our experience with designing file formats, drafting a specification is necessary but not sufficient. As an example, we are still dealing weekly with invalid OME-TIFF produced by the community. In our experience, reference writing, reading and validation tools need to be provided in addition that support and document any format that will be used by the scientific community.

This is exactly why our recent efforts have been focused on building a reference implementation for writing OME-TIFF in C++. This library is now up-to-date with the latest model and we will use it as a driver for any necessary extension support more complex heterogeneous metadata but also new container formats like HDF that offer solutions for HCS, multi-resolution, and multi-view.

Our hope is to partner with others who are working on scalable container formats and provide a common method of discovering & defining the metadata in them. Suggestions and collaborations are most welcome.

Best,
Sebastien


[1] https://www.openmicroscopy.org/site/support/ome-model/ome-tiff/file-structure.html
[2] https://www.openmicroscopy.org/community/viewtopic.php?f=13&t=8084&start=10#p17530
[3] http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2009-February/001158.html
[4] http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2012-May/thread.html#2220

On 13 Oct 2016, at 13:46, Christian Carsten Sachs <c.sachs at fz-juelich.de<mailto:c.sachs at fz-juelich.de>> wrote:

Hi Mario, hi Kai,

joining the discussion as I am also very interested in a HDF5 based
container format.

Given the data is to be assembled outside of an existing microscopy
software, I don't think it makes much sense to 'emulate' the structure
of a particular proprietary software's HDF format.
As you state it, a new reader would be the way to go.

There used to be some discussion about putting OME metadata in HDF5
containers, i.e. creating OME-HDF (analogously to OME-TIFF) files; with
discussions on the mailing list i.e. in 2009 [1] and 2012 [2].

Therefore I wonder if someone from the OME team could elaborate whether
progress has been made in this direction. I'd be very interested in
developments in that direction.

Best regards,
Christian Sachs

[1]
http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2009-February/001152.html
[2]
http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2012-May/002220.html

On 10/13/2016 02:37 PM, Mario Emmenlauer wrote:

Dear Kai,

On 13.10.2016 14:01, Kai Schleicher wrote:
One limitation for us is that whatever container format we chose must be
*read**by the Bio-formats*.

This is a very valid point. I'm also only aware of CellH5 and Imaris Format.
But I would not see this as a big limitation. HDF5 support is already in
Bio-Formats, so adding support for a specific "formatting" of HDF5 is rather
easy. If you choose any HDF5 based format, even if its not currently supported
in Bio-Formats, adding support will be relatively easy. But that's just me :)

But can you elaborate what your use-case is? Lets take an example, you have
a container for a plate with 10.368 images. Do you just want to read an image
with a known name? Or do you need a user interface to select the image from
the container, for example in Fiji? If a user interface is needed, how
elaborate does it need to be, to pick the images? Would a scrollable listing
be sufficient, or should it rather be more comfortable, like a plate layout,
or something more fancy?

Cheers,

  Mario



Writing the same format by the Bio-formats would be great too, but we could
probably work around that.

This limitation is only met by CellH5 if I am not mistaken. Are there
alternatives for the Bio-formates?

Thanks for your input and cheers,
Kai



On 10/13/2016 11:30 AM, Mario Emmenlauer wrote:
Dear Kai,

I have a slightly lengthy reply because we have been doing this for a while,
with certain pro's and cons. We have stored a number of single HCS images in
HDF5 containers, to make them better suited for the storage and archiving.
Our common data sizes are 384WP, 9 sites per well, 3-4 channels. Therefore we
have a total of ~12.000 images per plate. Currently we host some 5500 plates.

When we went to HDF5, we could observe a significant number of effects. First,
depending on the block size of the file system of the storage, we use less
storage space! That's because many storage systems use a larger block size,
which effectively "wastes" a bit of space on every image. We could gain up to
20% space by combining the images into one container.

Then we could also significantly improve network transfer rates with the
containers. Its mostly the storage read/write rates that go up, and we could
improve throughput up to 3-fold! This is also great for archiving and backup.

However there are also downsides. First its more cumbersome to access the
images, because you add a layer of complexity. For us, the HDF5 support is
built into the storage data base, so when we browse images on the web and
download, the HDF5 is transparent. But every once in a while we need to
access the images on the disk. In order to do that, we now use ImageJ, or
an archive extractor, or a Matlab HDF5 reader that we implemented ourselves.
It works, but its just good to know that the user experience is not the same.

In my humble opinion the benefits outweigh the downsides. And I'd recommend
an HDF5 based format, because its broadly supported, and very open! Many large
players build on it, like Nasa, so I hope it will be supported for long. Also
the HDF5 support in Fiji/ImageJ is quite good. However, this still leaves you
with a problem how you'd like to "format" your HDF5 file. This is a separate
problem! Think of it like a zip-archive: its up to you what folder layout you
have inside the zip container, and this can still very much impact the proper-
ties of working with the data.

We chose the "H5AR" formatting (and library) from SIS, because its easy to use,
and because they are the providers of Java HDF5 for Fiji/ImageJ. This is also
the format used by openBIS. Its actively developed and quite mature, we did
not encounter any problems. CellH5 would be different alternative. And there
is the BigDataViewer HDF5 format.
I found H5AR very easy to use, its really not much more than a container.
The ease of use finally made the race for us. If you find this format worth-
while, I can also provide you with Matlab and Python readers for the HDF5
that transparently handle the container. So with the readers, you can work
with the container as if it where a directory:
   % Example: Matlab dir() and imread() wrappers:
   vFileList = dirh5('/path/to/data.h5ar');
   vImage = imreadh5(['/path/to/data.h5ar/' vFileList(1).name]);


Cheers,

   Mario




On 12.10.2016 16:40, Kai Schleicher wrote:
Hi,

We are looking for a container-file format to store HTS/HCS data. This format
needs to be read and written by Bio-formats.

The reason is that our storage file system deals much better with a few large
files than with many small files. Hence we'd expect to increase usability and
performance drastically when handling the data.

The output files of our microscope is for example *.HTD (metadata / structure)
next to  *.TIF (images and thumbnails). Here each position, channel, time-point
and plane are saved as individual *.TIFs. This creates a lot of images for each
multi-well plate screen.

So far we have been looking into CellH5 which looks promising but development
appears discontinued (correct me if I am wrong). Are there alternatives?

Thanks for your help and cheers,
Kai



Viele Gruesse,

   Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/
_______________________________________________
ome-users mailing list
ome-users at lists.openmicroscopy.org.uk
http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

_______________________________________________
ome-users mailing list
ome-users at lists.openmicroscopy.org.uk
http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users


The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20161017/2d86b8ad/attachment.html>


More information about the ome-users mailing list