[ome-users] can ome files OMETIFFWriter compress embedded xml?

Roger Leigh rleigh at dundee.ac.uk
Mon Apr 23 10:31:27 BST 2018


On 21/04/18 13:10, Mario Emmenlauer wrote:
> I understand that each ome tiff file contains the full meta data xml,
> so that "even if some of the TIFF files in a dataset are misplaced, the
> metadata remains intact" [1]. I find this quite nice! But it also makes
> me wonder if it would be possible to add compression of embedded xml in
> ome tiff files?
>
> I am under the impression that ome-files supports gzip and bzip2 out
> of the box in order to decode embedded image data in xml files. So it
> seems relatively straightforward to add the vice-versa for embedded
> xml in ome tif :-)
>
> Of course the meta data generally makes up only a fraction of the image
> data, so it might not be a super high priority. But on the other hand,
> xml is notoriously big, and can easily add several KB per file.

Dear Mario,

It can end up being many tens of megabytes once you start storing extra
metadata such as ROIs, so there is a big advantage to compression (or
alternative binary representations).  On the other hand, since the
parsed metadata model is stored in memory, there are still scalability
concerns with regard to parsing and storing the resulting XML DOM tree
and OME model object tree, which compression won't help with!

OME Files does by default link against zlib and libbz2, so the
functionality to do the decompression is certainly present, but isn't
actively used yet except by boost.iostreams (the OME-XML base64 BinData
decompression support isn't implemented yet, since we don't yet have an
OME-XML reader, and BinData not used by the OME-TIFF reader in C++ or
Java).  The primary concern with adding such support would be breaking
of the existing specification, that is compressed metadata would not be
readable by existing readers in compliance with the OME-TIFF specification¹

An existing approach to reducing space wastage is the use of a companion
file² so that the metadata is placed in a separate XML file and the TIFF
files only have a small bit of XML referencing it.  If you're using
large multi-file datasets, this will work today and is what I'd
recommend using in the interim if possible.

Metadata compression could likely be done by compression and
base64-encoding of the ImageDescription content.  If using a companion
file, it could be directly compressed without the base64 encoding
(companion.ome.xml.gz|bz2).  I have created a Trello card³ to track
this.  However, I should stress that because this would require careful
investigation due to the significant compatibility break it introduces,
it is unlikely we would be able to work on this in the near-term.  Other
formats we are investigating, such as HDF5, offer transparent
compression and as we add support for them we can add support for
metadata compression from the start, without any compatibility concerns.


Kind regards,
Roger


¹
https://docs.openmicroscopy.org/latest/ome-model/ome-tiff/specification.html
²
https://docs.openmicroscopy.org/ome-model/5.6.3/ome-tiff/specification.html#partial-ome-xml-metadata
³ https://trello.com/c/WOq9BTfH/159-ome-tiff-metadata-compression

--
Dr Roger Leigh -- Open Microscopy Environment
Wellcome Trust Centre for Gene Regulation and Expression,
College of Life Sciences, University of Dundee, Dow Street,
Dundee DD1 5EH Scotland UK   Tel: (01382) 386364

The University of Dundee is a registered Scottish Charity, No: SC015096


More information about the ome-users mailing list