[ome-devel] OME-TIFF specification updates

Fri Feb 9 20:39:39 GMT 2007

Hi everyone,

In recent developer discussions, several suggestions for improving the
OME-TIFF specification have emerged. Very briefly, they are:

A) Allow OME-TIFF datasets to use BigTIFF, rather than normal 32-bit
TIFF, as appropriate.

B) Update the recommended filename extension from .tif to .ome.tif, for clarity.

C) Include an XML comment at the top of the OME-XML metadata block
with a warning to those who would edit the block by hand.

D) Store a hash (SHA1) of the pixel data associated with each TiffData element.

E) Store a hash (SHA1) of the OME-XML metadata in a custom TIFF tag.

(A) allows single OME-TIFF files larger than 4 GB in size, making it
possible to represent a very large image as a single OME-TIFF file,
just like you can with OME-XML. (B) through (E) are designed to
address dangers with OME-TIFF's inherent compatibility with regular
(non-OME-aware) TIFF software.

Please note that all of these changes are purely optional enhancements
to the specification, meaning that existing OME-TIFF files would
remain valid.

I invite the community to provide comments and suggestions on these
updates. If there are no objections or further ideas within the next
weeks, I will integrate these changes with the OME-TIFF documentation
(http://www.loci.wisc.edu/ome/ome-tiff.html), and we will begin
implementing the suggestions in our OME-TIFF software (WiscScan,
Bio-Formats, etc.).

Additional detail on each point follows.

A) Allow OME-TIFF datasets to use BigTIFF, rather than normal 32-bit
TIFF, as appropriate.

BigTIFF (http://www.awaresystems.be/imaging/tiff/bigtiff.html) is a
"least necessary changes" variant of TIFF that uses 64-bit (and
beyond) pointers, allowing what are essentially TIFF files of
extremely large size. It is currently being actively developed, with a
full libtiff implementation due in July 2007. Expanding the OME-TIFF
specification to allow OME-TIFF files to use BigTIFF makes it possible
to represent a very large image within a single OME-TIFF file, just
like you can with OME-XML.

This solution has the advantage that any application using libtiff as
the basis for reading OME-TIFF would transparently support the BigTIFF
extension, once the new version of libtiff is available. The downside
is that readers that do not use libtiff would be more difficult to
implement. However: (a) extending a TIFF reader implementation to
support BigTIFF is very straightforward to due BigTIFF's extreme
similarity to regular TIFF; and (b) we will list reader support for
BigTIFF as optional for an application -- just as support for many
aspects of regular TIFF are not part of the "baseline" requirements
for TIFF readers.

B) Update the recommended filename extension from .tif to .ome.tif
(and .tiff to .ome.tiff), for clarity.

Encouraging an extension of .ome.tif would allow software to more
easily identify when a TIFF file is potentially OME-TIFF, while
retaining the ultimate .tif extension for use with regular TIFF
software. I see no downside to this idea.

C) Include an XML comment at the top of the OME-XML metadata block
with a warning to those who would edit the block by hand.

This warning would be useful to alert users about the dangers of
editing the OME-XML block by hand when using TIFF applications with
comment editing capabilities. While a savvy user might be capable of
editing the block manually, many users should be discouraged (but not
prevented) from doing so recklessly. The only downside to such a
comment is the few extra (negligible) bytes necessary to include it.

D) Store a hash (SHA1) of the pixel data associated with each TiffData element.

One possible problem with OME-TIFF might be if an OME-TIFF file is: a)
read into an imaging application; b) resized, cropped or otherwise
altered in some way; and c) saved back to disk, preserving the TIFF
comment (OME-XML block). Such a file would still appear to be
OME-TIFF, but would be invalid, because the XML metadata would no
longer accurately reflect details about the pixels such as image
dimensions, endianness, etc.

To avoid this problem, a SHA1 hash string can be computed for each
block of pixel data (one or more image planes) denoted by a TiffData
element. This hash would be an optional attribute of TiffData, and
would be useful for verifying that the pixels have not been altered in
some way -- "out from under" the OME-XML metadata -- by a
non-OME-aware TIFF application. Such knowledge could be used by
OME-aware applications to warn users of potentially (but not
necessarily) compromised data -- entering a "cautious" mode rather
than assuming the OME-TIFF is completely correct.

The downsides to storing such SHA1 strings are: a) negligible increase
in the size of the OME-XML metadata blocks; and b) increased
complexity of OME-TIFF writers. But since inclusion of the SHA1 hashes
is purely optional, (b) should not pose a threat to adoption.

E) Store a hash (SHA1) of the OME-XML metadata in a custom TIFF tag.

Similarly to hashing the pixels, a SHA1 hash of the OME-XML metadata
block could be stored in a custom TIFF tag (registered with Adobe).
This hash could be used to verify that the OME-XML metadata block has
not been altered in some "under the table" way. Again, this knowledge
could be used by OME-aware applications to warn users of potentially
(but not necessarily) compromised data, and enter some kind of
"cautious" mode.

It is important to note that many (all?) non-OME-aware TIFF
applications would not only ignore this custom TIFF tag, but also
would not write it back out to disk when the TIFF is resaved. As such,
when the custom tag is missing, OME-aware applications could suspect
the file may have been resaved by a third-party application, and enter
"cautious" mode accordingly.

One suggested alternative to storing a SHA1 hash of the metadata block
would be to store a complete copy of the block in the custom tag
instead. This solution would give OME-aware applications more
information to infer exactly how an OME-XML block has changed, but has
the downside of duplicating the XML wholesale, which is somewhat ugly
and inelegant. Moreover, I do not have a clear idea of the practical
advantage of such a scheme over the SHA1 solution -- comments from
anyone interested are welcome.

Again, I invite the community to participate with any comments,
suggestions or questions regarding these updates, or other ideas for
improving the OME-TIFF specification.

Thanks,
Curtis