[ome-users] HCS File Format / HDF5

Mario Emmenlauer mario at emmenlauer.de
Wed Oct 10 10:26:59 BST 2018


Dear Manuel,

On 10.10.2018 08:46, Manuel Stritt wrote:
> Dear all,
> 
> we're currently rethinking our high content screening workflow and thus I'm thinking about a good way to store all data.
> My idea is to create one e.g. HDF5 file per plate which contains all the images + meta data.
> Do  have any recommendations regarding that (I think Mario once triggered a discussion around that topic) ?
> 
> If some kind of HDF5 is considered as solution - then still a structure specification would be needed. 
> Kai and Nico mentioned the CellH5 format, a flavor of HDF5.
> As far as I can see this is supported by bioformats / Omero. However, it's still unclear how
> to pack the output of a e.g. Opera machine into a CellH5 format in a convenient way.
> 
> In addition there was a discussion about an official OME-HDF5 format, right? 
> What's the current status for that?

In our testing, the containers provide a big benefit for the file
system and storage back end, especially when a large NAS storage is
used. For a typical desktop application with a local spinning disk,
there was virtually no difference in speed or file system overhead.

On a NAS, we got a 30% eduction in storage space due to reduced chunk
size overhead. This is a tunable parameter, so your mileage may vary!
It may be anything between 0% and up to 50% (or more) reduction,
depending on your image file size and the file system chunk size.

Furthermore we got significantly faster data transfer rates because
the data is more consecutive. An rsync transfer of a full plate from/to
the NAS was about twice as fast when transferring the container than
the individual files.

Last not least the container can apply transparent compression without
modifying the original file, so you can (transparently) apply bzip2 or
other compression schemes on the raw file, while keeping the full
(proprietary) file intact and unchanged.


All this comes also at a price. There where situations where we would
have liked to access images and it was not as easy as we hoped. Its
good if the database supports download of individual files, to make
the image access transparent for end users. But when the database is
down, image access becomes quite hard for end users. Furthermore, there
is a unlucky number of ~100-1000 files where access is always a hazzle,
because manual download becomes too cumbersome and container extraction
is typically not super comfortable.

Long story short: containers are great, but I would really love to
see them in combination with a simple, graphical, cross-platform file
management utility that allows adding and extracting files. There are
some such utilities like HDF5View [1] but they are not yet comparable
to something like WinZip/WinRAR/7Zip/...

[1] https://support.hdfgroup.org/HDF5/Tutor/hdfview.html

Viele Gruesse,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/


More information about the ome-users mailing list