[ome-users] File format for HTS/HCS data

Mario Emmenlauer mario at emmenlauer.de
Thu Oct 13 13:37:34 BST 2016


Dear Kai,

On 13.10.2016 14:01, Kai Schleicher wrote:
> One limitation for us is that whatever container format we chose must be
> *read**by the Bio-formats*.

This is a very valid point. I'm also only aware of CellH5 and Imaris Format.
But I would not see this as a big limitation. HDF5 support is already in
Bio-Formats, so adding support for a specific "formatting" of HDF5 is rather
easy. If you choose any HDF5 based format, even if its not currently supported
in Bio-Formats, adding support will be relatively easy. But that's just me :)

But can you elaborate what your use-case is? Lets take an example, you have
a container for a plate with 10.368 images. Do you just want to read an image
with a known name? Or do you need a user interface to select the image from
the container, for example in Fiji? If a user interface is needed, how
elaborate does it need to be, to pick the images? Would a scrollable listing
be sufficient, or should it rather be more comfortable, like a plate layout,
or something more fancy?

Cheers,

   Mario



> Writing the same format by the Bio-formats would be great too, but we could
> probably work around that.
> 
> This limitation is only met by CellH5 if I am not mistaken. Are there
> alternatives for the Bio-formates?
> 
> Thanks for your input and cheers,
> Kai
> 
> 
> 
> On 10/13/2016 11:30 AM, Mario Emmenlauer wrote:
>> Dear Kai,
>>
>> I have a slightly lengthy reply because we have been doing this for a while,
>> with certain pro's and cons. We have stored a number of single HCS images in
>> HDF5 containers, to make them better suited for the storage and archiving.
>> Our common data sizes are 384WP, 9 sites per well, 3-4 channels. Therefore we
>> have a total of ~12.000 images per plate. Currently we host some 5500 plates.
>>
>> When we went to HDF5, we could observe a significant number of effects. First,
>> depending on the block size of the file system of the storage, we use less
>> storage space! That's because many storage systems use a larger block size,
>> which effectively "wastes" a bit of space on every image. We could gain up to
>> 20% space by combining the images into one container.
>>
>> Then we could also significantly improve network transfer rates with the
>> containers. Its mostly the storage read/write rates that go up, and we could
>> improve throughput up to 3-fold! This is also great for archiving and backup.
>>
>> However there are also downsides. First its more cumbersome to access the
>> images, because you add a layer of complexity. For us, the HDF5 support is
>> built into the storage data base, so when we browse images on the web and
>> download, the HDF5 is transparent. But every once in a while we need to
>> access the images on the disk. In order to do that, we now use ImageJ, or
>> an archive extractor, or a Matlab HDF5 reader that we implemented ourselves.
>> It works, but its just good to know that the user experience is not the same.
>>
>> In my humble opinion the benefits outweigh the downsides. And I'd recommend
>> an HDF5 based format, because its broadly supported, and very open! Many large
>> players build on it, like Nasa, so I hope it will be supported for long. Also
>> the HDF5 support in Fiji/ImageJ is quite good. However, this still leaves you
>> with a problem how you'd like to "format" your HDF5 file. This is a separate
>> problem! Think of it like a zip-archive: its up to you what folder layout you
>> have inside the zip container, and this can still very much impact the proper-
>> ties of working with the data.
>>
>> We chose the "H5AR" formatting (and library) from SIS, because its easy to use,
>> and because they are the providers of Java HDF5 for Fiji/ImageJ. This is also
>> the format used by openBIS. Its actively developed and quite mature, we did
>> not encounter any problems. CellH5 would be different alternative. And there
>> is the BigDataViewer HDF5 format.
>> I found H5AR very easy to use, its really not much more than a container.
>> The ease of use finally made the race for us. If you find this format worth-
>> while, I can also provide you with Matlab and Python readers for the HDF5
>> that transparently handle the container. So with the readers, you can work
>> with the container as if it where a directory:
>>     % Example: Matlab dir() and imread() wrappers:
>>     vFileList = dirh5('/path/to/data.h5ar');
>>     vImage = imreadh5(['/path/to/data.h5ar/' vFileList(1).name]);
>>
>>
>> Cheers,
>>
>>     Mario
>>
>>
>>
>>
>> On 12.10.2016 16:40, Kai Schleicher wrote:
>>> Hi,
>>>
>>> We are looking for a container-file format to store HTS/HCS data. This format
>>> needs to be read and written by Bio-formats.
>>>
>>> The reason is that our storage file system deals much better with a few large
>>> files than with many small files. Hence we'd expect to increase usability and
>>> performance drastically when handling the data.
>>>
>>> The output files of our microscope is for example *.HTD (metadata / structure)
>>> next to  *.TIF (images and thumbnails). Here each position, channel, time-point
>>> and plane are saved as individual *.TIFs. This creates a lot of images for each
>>> multi-well plate screen.
>>>
>>> So far we have been looking into CellH5 which looks promising but development
>>> appears discontinued (correct me if I am wrong). Are there alternatives?
>>>
>>> Thanks for your help and cheers,
>>> Kai



Viele Gruesse,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/


More information about the ome-users mailing list