[ome-devel] Storing single plane OME-TIFF files (Was: Zeiss 710)

Thu Oct 21 06:21:38 BST 2010

Hi Andrew,

>From a technical perspective, I really like the first proposed "MultiPart"
solution. Of course, I don't have a real use case like Rubén does—at LOCI we
generally use one OME-TIFF file per time point (storing all Z and C
positions for that time point in one file), and we aren't doing HCS.

I also agree that MultiPart should be used only in extreme cases like
Rubén's, since it greatly increases the chance of metadata loss.

One thing we could consider is recommending a different file extension for
OME-TIFF "slave" files versus the master... though off the top of my head I
can't think of a good alternate extension (something like ".ome-nometa.tif"
is pretty unwieldy). If we did that, then users could more easily tell which
file is the master based on the filename.

-Curtis

On Wed, Oct 20, 2010 at 2:52 PM, Andrew Patterson <
ajpatterson at lifesci.dundee.ac.uk> wrote:

> Hello Ruben,
>
> Thanks for you work with us and all the sample data you have provided over
> the past few months. It is always very useful to see the kind of data people
> wish to store as, while we can think about what data people will have, there
> is nothing like real data to test our model with.
>
> In my reply I hope you do not mind if I open out the discussions to the
> general case. Some of this will reply to your problem, other areas, I
> believe from your mails with Melissa and Curtis, are solutions you have
> already rejected for your own reasons. This is understandable as what we are
> aiming to provide a general solution so it cannot be the ideal solution in
> all cases.
>
> The growth of imaging systems and dataset sizes since our first OME-XML
> model in 2003 has been impressive and we have been expanding OME-XML as we
> have gone along to encompass new meta-data like the Screen/Plate/Well
> extension. This has in turn allowed the image collections to grow and grow.
> In theory the size of an OME-XML file is mainly limited by the size of the
> underlying file system it is stored on, but OME-XML is not the most
> efficient way of representing binary data.
>
> A alternative storage solution for the binary part of the data is
> desirable, in this case our solution was OME-TIFF. This had two key
> advantages, it can be viewed as "just a tiff", making it familiar, useful
> and acceptable to many people. The second key feature is like OME-XML a TIFF
> file can contain multiple image planes in one file. This allowed us to
> continue the "everything in one place" approach of keeping an image and its
> metadata together.
>
> OME-TIFF does have a problem however, the TIFF file format uses 32bit
> offsets and, as such, a file is limited to 4 gigabytes. A massive file when
> the format was designed in the late 1980s.
>
> To get round this limitations we introduced the idea of multi-part
> OME-TIFFs. This allows the binary image plane data to be stored in 2 or more
> OME-TIFF files. Each of these files contains the full metadata and pointers
> to the location of all the files containing the rest of the rest of the
> image planes. This allows any file to be opened and the completeness of the
> data detected and absent parts hopefully located.
>
> (Another solution to the 4 gigabytes limit is a BigTIFF variant of OME-TIFF
> but I will not deal with that here.)
>
> As is often the case as soon as you introduce flexibility into a solution,
> in this case number of files, different use cases will pull the best
> solution in different directions. The way our solution was designed we
> favour multiple image planes stored in each OME-TIFF producing a smaller
> number of larger files. The reasons for this are varied but are based on our
> experience and the problems people can have managing vast numbers of files
> on the file system.
>
> While it is proper and VALID to store a single plane in each OME-TIFF if
> can make for an unwieldy dataset, and as you have noticed, is not the most
> space efficient way to store the data. The reason for this is two-fold.
> Firstly if a file is used for each plane the location of every single plane
> has to be listed individually in a TiffData node. The way this node is
> designed to work is to point to the location of the first image plane and
> use the "PlaneCount" attribute (old name "NumPlanes") to say how many planes
> to read in sequence from that starting point in the TIFF structure. The use
> of multi-plane TIFFs allows for a much smaller number of TiffData nodes.
> Secondly because of our approach of always keeping matadata with its image
> data this metadata gets duplicated in each file. This was a decision we have
> taken as a general principle, of course anything can be up for review.
>
> So what can be done to reduce the size of your data if you need for some
> reason to use single plane TIFFs?
> Well under the current schema your options are limited. You can tinker with
> the exact structure of the "TiffData" nodes and "UUID" nodes. For example
> instead of having 'IFD="0"' in every "TiffData" node it can be omitted as
> the attribute defaults to 0. A more drastic change is to omit the FileName
> attribute from the "UUID" node. This still produces valid files as the key
> value for piecing a file together on import should be the UUID, the FileName
> is just a hint where to look first. An importer should scan the other files
> in the same folder looking for missing parts based on the UUID - this
> approach allows file sets to survive renaming. This file size optimisation
> does of course make import of the dataset less efficient.
>
> So what changes could be made to the schema to increase the efficiency of
> storage using only single plane OME-TIFF files?
> First I must say these suggestions largely go against our goal of always
> keeping the metadata with the binary image date. But this is something that
> may need to be revised.
>
> One possible solution is to add a multi-part attribute to the top level OME
> node and strip almost all of the OME-XML from all the single plane TIFFs.
> This metadata would then reside in a master file and look much as it does
> now. The new attribute MultiPart if set would be the UUID of the master file
> set the file is part of.
>
> This would reduce the OME-XML in the TIFF header of each single plane file
> to a single empty OME node:
>    <OME xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>    UUID="urn:uuid:707e7b82-155a-43db-9f3d-1af4e6a212f8"
>    MultiPart="urn:uuid:4062f7ac-dc41-11df-abaf-774b41a01549"
>    xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/20??-??
> http://www.openmicroscopy.org/Schemas/OME/20??-??/ome.xsd"
>    xmlns="http://www.openmicroscopy.org/Schemas/OME/20??-??"/>
>
> The Master file would start:
>    <OME xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>    UUID="urn:uuid:4062f7ac-dc41-11df-abaf-774b41a01549"
>    MultiPart="urn:uuid:4062f7ac-dc41-11df-abaf-774b41a01549"
>    xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/20??-??
> http://www.openmicroscopy.org/Schemas/OME/20??-??/ome.xsd"
>    xmlns="http://www.openmicroscopy.org/Schemas/OME/20??-??">
>      <Plate...
> Followed by the rest of the metadata as it is now. You can tell it is the
> master as the UUID of the file and the UUID of the MultiPart are the same.
>
> This is a clean and ruthless culling of all the metadata in the individual
> files. The resulting single files in isolation do not have even enough
> metadata to display the image properly.
>
>
> A second possible solution is to split the data on an Image basis. This
> would not be as space efficient as the above solution but would at least
> leave some of the metadata intact in the files.
>
> Think of the metadata as belonging to a few distinct types.
>    Screen/Plate/Well description
>    Project/Dataset description
>    Instrument description
>    Image description
>
> In the sample Plate data you sent there are 260 Images. The data for all of
> these images is in each file in the set. What we could do is alter the
> structure so there is a master file containing:
>    Screen/Plate/Well description
>    Project/Dataset description
>    Instrument description
>    Image description
> This master file would be largely identical to the example above.
>
> Each single plane TIFF file would be marked as MultiPart as above but would
> have some metadata. It would contain the 'Image description' metadata for
> only the Image it is part of. This is of course a more complex set of files
> for an application to read and write. You would also end up with the problem
> that you need to have contestant IDs across all the files and you may have
> the problem of the metadata in individual single plane TIFFs referring to
> IDs that are not present in there own metadata, for example, a lightsource
> only define in the Master file.
>
> This is just two possible suggestions. I would like to gage the feeling of
> the community on this matter.
> It is a departure from our current position. While we have had the idea of
> a separate MetadataOnly file it was not something we have encouraged.
>
> How import is this kind of storage across a vast number of single plane
> files?
>
> Thanks again for letting us look at you data and raising your use case with
> us. I would love to hear other peoples thoughts and suggestions on this.
>
> Cheers,
>
> Andrew
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20101021/46ddf062/attachment-0001.html>