[ome-users] File format for HTS/HCS data

Mario Emmenlauer mario at emmenlauer.de
Thu Oct 13 10:30:25 BST 2016


Dear Kai,

I have a slightly lengthy reply because we have been doing this for a while,
with certain pro's and cons. We have stored a number of single HCS images in
HDF5 containers, to make them better suited for the storage and archiving.
Our common data sizes are 384WP, 9 sites per well, 3-4 channels. Therefore we
have a total of ~12.000 images per plate. Currently we host some 5500 plates.

When we went to HDF5, we could observe a significant number of effects. First,
depending on the block size of the file system of the storage, we use less
storage space! That's because many storage systems use a larger block size,
which effectively "wastes" a bit of space on every image. We could gain up to
20% space by combining the images into one container.

Then we could also significantly improve network transfer rates with the
containers. Its mostly the storage read/write rates that go up, and we could
improve throughput up to 3-fold! This is also great for archiving and backup.

However there are also downsides. First its more cumbersome to access the
images, because you add a layer of complexity. For us, the HDF5 support is
built into the storage data base, so when we browse images on the web and
download, the HDF5 is transparent. But every once in a while we need to
access the images on the disk. In order to do that, we now use ImageJ, or
an archive extractor, or a Matlab HDF5 reader that we implemented ourselves.
It works, but its just good to know that the user experience is not the same.

In my humble opinion the benefits outweigh the downsides. And I'd recommend
an HDF5 based format, because its broadly supported, and very open! Many large
players build on it, like Nasa, so I hope it will be supported for long. Also
the HDF5 support in Fiji/ImageJ is quite good. However, this still leaves you
with a problem how you'd like to "format" your HDF5 file. This is a separate
problem! Think of it like a zip-archive: its up to you what folder layout you
have inside the zip container, and this can still very much impact the proper-
ties of working with the data.

We chose the "H5AR" formatting (and library) from SIS, because its easy to use,
and because they are the providers of Java HDF5 for Fiji/ImageJ. This is also
the format used by openBIS. Its actively developed and quite mature, we did
not encounter any problems. CellH5 would be different alternative. And there
is the BigDataViewer HDF5 format.
I found H5AR very easy to use, its really not much more than a container.
The ease of use finally made the race for us. If you find this format worth-
while, I can also provide you with Matlab and Python readers for the HDF5
that transparently handle the container. So with the readers, you can work
with the container as if it where a directory:
    % Example: Matlab dir() and imread() wrappers:
    vFileList = dirh5('/path/to/data.h5ar');
    vImage = imreadh5(['/path/to/data.h5ar/' vFileList(1).name]);


Cheers,

    Mario




On 12.10.2016 16:40, Kai Schleicher wrote:
> Hi,
> 
> We are looking for a container-file format to store HTS/HCS data. This format
> needs to be read and written by Bio-formats.
> 
> The reason is that our storage file system deals much better with a few large
> files than with many small files. Hence we'd expect to increase usability and
> performance drastically when handling the data.
> 
> The output files of our microscope is for example *.HTD (metadata / structure)
> next to  *.TIF (images and thumbnails). Here each position, channel, time-point
> and plane are saved as individual *.TIFs. This creates a lot of images for each
> multi-well plate screen.
> 
> So far we have been looking into CellH5 which looks promising but development
> appears discontinued (correct me if I am wrong). Are there alternatives?
> 
> Thanks for your help and cheers,
> Kai
> 



--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/


More information about the ome-users mailing list