[ome-users] can ome files OMETIFFWriter compress embedded xml?

Mario Emmenlauer mario at emmenlauer.de
Mon Apr 23 11:15:16 BST 2018


Dear Roger,

On 23.04.2018 11:31, Roger Leigh wrote:
> On 21/04/18 13:10, Mario Emmenlauer wrote:
>> I understand that each ome tiff file contains the full meta data xml,
>> so that "even if some of the TIFF files in a dataset are misplaced, the
>> metadata remains intact" [1]. I find this quite nice! But it also makes
>> me wonder if it would be possible to add compression of embedded xml in
>> ome tiff files?
>>
>> I am under the impression that ome-files supports gzip and bzip2 out
>> of the box in order to decode embedded image data in xml files. So it
>> seems relatively straightforward to add the vice-versa for embedded
>> xml in ome tif :-)
>>
>> Of course the meta data generally makes up only a fraction of the image
>> data, so it might not be a super high priority. But on the other hand,
>> xml is notoriously big, and can easily add several KB per file.
> 
> Dear Mario,
> 
> It can end up being many tens of megabytes once you start storing extra
> metadata such as ROIs, so there is a big advantage to compression (or
> alternative binary representations).  On the other hand, since the
> parsed metadata model is stored in memory, there are still scalability
> concerns with regard to parsing and storing the resulting XML DOM tree
> and OME model object tree, which compression won't help with!
> 
> OME Files does by default link against zlib and libbz2, so the
> functionality to do the decompression is certainly present, but isn't
> actively used yet except by boost.iostreams (the OME-XML base64 BinData
> decompression support isn't implemented yet, since we don't yet have an
> OME-XML reader, and BinData not used by the OME-TIFF reader in C++ or
> Java).  The primary concern with adding such support would be breaking
> of the existing specification, that is compressed metadata would not be
> readable by existing readers in compliance with the OME-TIFF specification¹
> 
> An existing approach to reducing space wastage is the use of a companion
> file² so that the metadata is placed in a separate XML file and the TIFF
> files only have a small bit of XML referencing it.  If you're using
> large multi-file datasets, this will work today and is what I'd
> recommend using in the interim if possible.
> 
> Metadata compression could likely be done by compression and
> base64-encoding of the ImageDescription content.  If using a companion
> file, it could be directly compressed without the base64 encoding
> (companion.ome.xml.gz|bz2).  I have created a Trello card³ to track
> this.  However, I should stress that because this would require careful
> investigation due to the significant compatibility break it introduces,
> it is unlikely we would be able to work on this in the near-term.  Other
> formats we are investigating, such as HDF5, offer transparent
> compression and as we add support for them we can add support for
> metadata compression from the start, without any compatibility concerns.

Thanks for this very detailed explanation! I share your concerns about
the backwards compatibility. But on the other hand it could help us with
the adoption of OME TIFF if we could already foresee a more space-
efficient solution down the road. My understanding is that the number
of options is quite limited: I can think of only three or four ways how
to implement compressed meta data in OME TIFF. Do you think it would be
possible to continue our discussion and come up with a reasonable proposal
over the next 6 months? Your estimate would certainly help me a lot!

All the best,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/


More information about the ome-users mailing list