[ome-users] File format for HTS/HCS data

Kai Schleicher kai.schleicher at unibas.ch
Thu Oct 13 13:01:30 BST 2016


Dear Mario,

thanks for this elaborate and very useful answer!

One limitation for us is that whatever container format we chose must be 
*read**by the Bio-formats*.

Writing the same format by the Bio-formats would be great too, but we 
could probably work around that.

This limitation is only met by CellH5 if I am not mistaken. Are there 
alternatives for the Bio-formates?

Thanks for your input and cheers,
Kai



On 10/13/2016 11:30 AM, Mario Emmenlauer wrote:
> Dear Kai,
>
> I have a slightly lengthy reply because we have been doing this for a while,
> with certain pro's and cons. We have stored a number of single HCS images in
> HDF5 containers, to make them better suited for the storage and archiving.
> Our common data sizes are 384WP, 9 sites per well, 3-4 channels. Therefore we
> have a total of ~12.000 images per plate. Currently we host some 5500 plates.
>
> When we went to HDF5, we could observe a significant number of effects. First,
> depending on the block size of the file system of the storage, we use less
> storage space! That's because many storage systems use a larger block size,
> which effectively "wastes" a bit of space on every image. We could gain up to
> 20% space by combining the images into one container.
>
> Then we could also significantly improve network transfer rates with the
> containers. Its mostly the storage read/write rates that go up, and we could
> improve throughput up to 3-fold! This is also great for archiving and backup.
>
> However there are also downsides. First its more cumbersome to access the
> images, because you add a layer of complexity. For us, the HDF5 support is
> built into the storage data base, so when we browse images on the web and
> download, the HDF5 is transparent. But every once in a while we need to
> access the images on the disk. In order to do that, we now use ImageJ, or
> an archive extractor, or a Matlab HDF5 reader that we implemented ourselves.
> It works, but its just good to know that the user experience is not the same.
>
> In my humble opinion the benefits outweigh the downsides. And I'd recommend
> an HDF5 based format, because its broadly supported, and very open! Many large
> players build on it, like Nasa, so I hope it will be supported for long. Also
> the HDF5 support in Fiji/ImageJ is quite good. However, this still leaves you
> with a problem how you'd like to "format" your HDF5 file. This is a separate
> problem! Think of it like a zip-archive: its up to you what folder layout you
> have inside the zip container, and this can still very much impact the proper-
> ties of working with the data.
>
> We chose the "H5AR" formatting (and library) from SIS, because its easy to use,
> and because they are the providers of Java HDF5 for Fiji/ImageJ. This is also
> the format used by openBIS. Its actively developed and quite mature, we did
> not encounter any problems. CellH5 would be different alternative. And there
> is the BigDataViewer HDF5 format.
> I found H5AR very easy to use, its really not much more than a container.
> The ease of use finally made the race for us. If you find this format worth-
> while, I can also provide you with Matlab and Python readers for the HDF5
> that transparently handle the container. So with the readers, you can work
> with the container as if it where a directory:
>      % Example: Matlab dir() and imread() wrappers:
>      vFileList = dirh5('/path/to/data.h5ar');
>      vImage = imreadh5(['/path/to/data.h5ar/' vFileList(1).name]);
>
>
> Cheers,
>
>      Mario
>
>
>
>
> On 12.10.2016 16:40, Kai Schleicher wrote:
>> Hi,
>>
>> We are looking for a container-file format to store HTS/HCS data. This format
>> needs to be read and written by Bio-formats.
>>
>> The reason is that our storage file system deals much better with a few large
>> files than with many small files. Hence we'd expect to increase usability and
>> performance drastically when handling the data.
>>
>> The output files of our microscope is for example *.HTD (metadata / structure)
>> next to  *.TIF (images and thumbnails). Here each position, channel, time-point
>> and plane are saved as individual *.TIFs. This creates a lot of images for each
>> multi-well plate screen.
>>
>> So far we have been looking into CellH5 which looks promising but development
>> appears discontinued (correct me if I am wrong). Are there alternatives?
>>
>> Thanks for your help and cheers,
>> Kai
>>
>
>
> --
> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
> D-81669 München                          http://www.biodataanalysis.de/
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20161013/aa0688e6/attachment.html>


More information about the ome-users mailing list