[ome-devel] HDF5 support

Curtis Rueden ctrueden at wisc.edu
Fri May 11 19:53:49 BST 2012


Hi Stephan, Jason et. al,

> HDF5 might certainly be a way to transport metadata, analytics, etc. But
> there are a large number of different types of data that could be stored in
> an HDF5 file, including images, metadata, analytics, annotations, etc. If a
> common, well-defined and performant specification emerges for this, we will
> support it. We'd emphasise that this will require not just a good spec, but
> also good, supported, maintained, cross-platform software libraries for
> reading and writing these data types, and good, well-worked example files.

A solid HDF5-based file format for multidimensional life sciences image data is
something that we all want, and has been asked about on this list before [1,
2]. To be clear, I think what Jason is saying above is that we (the OME group)
are not currently designing such a format, though we would be happy to partner
with someone who is. (Jason, please correct me if I am wrong here.)

That said, an OME-HDF file format is something we did pursue a few years back.
Unfortunately, the work was never fully realized as a published specification.
But we did complete a white paper proposal, and then discussed it in quite some
detail via private mail. From my perspective, it is past time that we make this
work and discussion available as a potential starting point to others—or at
least to serve as an illustration of what happened in the past, potential
pitfalls in the design of an "OME-HDF" file format, etc.

Hence, I am forwarding on this discussion (from 2007-2008). Please feel free to
have a look, and reply back with thoughts and questions if desired. Please note
that the attached draft was merely a proposal, and not something formally
published or supported by the OME project. Again, we do not currently have
dedicated resources to develop an open HDF5-based file format specification,
but would be happy to partner with someone who does.

Regards,
Curtis

[1] http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2009-February/001156.html
[2] http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2009-February/001153.html


---------- Forwarded message ----------
From: Curtis Rueden <ctrueden at wisc.edu>
Date: Mon, Jun 2, 2008 at 11:32 AM
Subject: Re: OME-HDF5

Hi Josh et al,

In preparation for the June developer meeting, I met with Eric Kjellman
today and we discussed the points you made in your OME-HDF email a while
back. What follows are some point by point comments. Hopefully we'll
have time to discuss things in more depth in person later this month.

> Thanks for sending this around. It's great to have something concrete
> to compare our needs against. In our case that includes the
> possibility of representing the internal format that will be used by
> the server for sending both raw and rendered data to clients. The
> primary requirement for that is that a pixels representation can be
> chosen which optimizes for speed. It seems like the dimensions
> "short-circuiting" -- allowing a blob to be put at any point in the
> pixels tree -- is an ideal solution.

Yep, I think the flexibility of pixels representation is a real
strength. Saying you want to "optimize for speed" requires an
elaboration of context, however -- the fastest representation in one
case will be different than in others. LOCI's main use case is plane by
plane processing and/or visualization, so we nearly always want to
divide the pixels into planar blocks, but having the option to do
otherwise is nice. It's just a pain for us developers during
implementation because we want to support all legal representations, as
well as maximize the file format reader's performance regardless of the
representation chosen.

> Regarding the metadata, however, my naive approach would have been
> quite different, but basically centers around using HDF datatypes to
> the fullest. Encoding dimensions as a single string is one example of
> something that can be done straight-forwardly in HDF.

We understand that the dimensionality can be encoded as part of the
actual data block, but that doesn't help with compressed data. There is
also an issue of convention regarding dimension order -- i.e., which
dimensions are which conceptually. Can HDF provide dimension labels?
>From your example, it seems you are storing the dimension labels and
order (and we would suggest pixel type) in a dedicated metadata node,
OME/Pixel.

> Though maybe I'm missing something, since I don't understand the
> purpose of the 1-dimensional dataset (0-dimensional?) "42" under your
> dimensions specifier.

The purpose of 42 (can be any number, really) is to specify the pixel
type of the data being encoded in the blobs. The pixel type of the 42
leaf will match. This is especially useful if the binary data is
compressed, since we would not otherwise know the pixel type of the
uncompressed blob, as it may differ from the compressed one.

As mentioned above, we can encode this pixel type information somewhere
else, but it must be noted somewhere.

> In general I think I'd do something less like XML. Obviously we need
> to maintain an isomorphism to OME-XML so that a conversion can be
> performed for any tool that needs to work on one rather than the
> other, but it seems that we could make the mapping a bit less
> explicit. I know that's how the OME-TIFF implementation works, but it
> seems OME-HDF is significantly different.

Yep, unsurprisingly, your example is less like XML, and more like
databases. :-)

> For starters, it has actual highly-performant ways of encoding all the
> types which are represented as string in XML. For anyone who doesn't
> already have code which works with our OME-XML, having direct access
> to the values could be beneficial. Not just that, barring the lack of
> foreign key constraints, one could conceivably dump an entire database
> to a single, massive OME-HDF file, which is an interesting if extreme
> use case.

We agree that this approach makes sense. We are fine with changing the
specification for attributes from string to be whatever the type of the
relevant attribute is in the schema. Slightly more work for the OME-HDF
writer, but more informative.

One downside is that then OME-HDF could not be used to express invalid
OME-XML -- is that a problem?

> I might just be wrong, which is fine, but the benefit of strings like
> "xmlns:CA:http://..." are that there are libraries to parse them. For
> HDF there aren't. I'm open to the need for a namespace concept, but
> perhaps it can be in an HDF-metaphor, not just XML in HDF.

In this case, the purpose is really just to document the version of the
specification being used, right? So perhaps we should just have an
OME-HDF version node rather than using XML namespaces.

> Which should be doable right? HDF is intended as a self-documenting
> format like XML. It probably makes sense then to make the naming and
> grouping a little less terse. And instead of naming each group with a
> unique id, one could also use an array, which brings me to the
> somewhat quirky files attached below.  This is what I was referring to
> on Teamspeak, Curtis. Basically I tried to take a very small subset of
> the model elements and attributes and encode them as efficiently as
> possible. That's probably not the actual goal, but it makes for a good
> comparison.

We agree that being less terse has advantages. It's easier to read. If
that's a goal, fine.

Your example is aesthetically pleasing as far as it goes. It becomes a
little more challenging when you start enumerating the blobs like we did
in our draft. For example:

|-+ pixels_root
  |-+ urn:lsid:example.com:Pixels:01
    |-+ 5d_p:2,c:16,t:16,z:1,y:240,x:320
    |  \- 42
    |-+ 6p_p:0
    | |-+ 7p_c:0
    | | |-+ 8p_t:0
    | |   |-+ 9p_z:0
    | |      \- (a 2d dataset containing x and y)
    | |-+ 10p_c:1
    |   |-+ 11p_t:0
    |     |-+ 12c_LZW
    |        \- (a 3d dataset containing x, y, and z, LZW compressed)
    |-+ 13p_p:1

Presumably, we want to be able to graft a structure like this onto a
node like your "Id0" beneath OME/Data/Pixels. There are some design
choices we have to make in order to do this.

As discussed above, we want to be able to subdivide an N-dimensional
pixels structure into blobs of various configurations, depending on the
situation. We assume that your Id0 node will be a non-leaf in the case
of such a subdivision. Then the question is, how do we label the nodes
in the child structure? Of course, they must be uniquely named. We could
use a structure similar to our original proposal but more verbose. E.g.:

|-+ Id0
  |-+ Id0-p0
  | |-+ Id0-p0-c0
  | | |-+ Id0-p0-c0-t0
  | |   |-+ Id0-p0-c0-t0-z0
  | |      \- (a 2d dataset containing x and y)
  | |-+ Id0-p0-c1
  |   |-+ Id0-p0-c1-t0
  |     |-+ Id0-p0-c1-t0-Compression_LZW
  |        \- (a 3d dataset containing x, y, and z, LZW compressed)
  |-+ Id0-p1

Still ugly, but perhaps less so? Note that this structure assumes the
LSID, dimensional sizes and (importantly) pixel type are encoded in the
OME/Pixel leaf elsewhere as in your example. This also eliminates the
need for the mysterious 42 node.

Another issue is that if we take this approach, we would need to expand
the definition of "dimension order" to include more than the current 5D
OME model. We need things like polarization, spectra and lifetime, and
it makes sense to allow any number of dimensions. Our design allows
multi-character dimension labels, which must be considered when
expressing the dimensional rasterization order as a single "order"
string as you have done.

> On another note, if this is a specification that is truly adopted
> throughout the community, it'd be nice go as far as we can with the
> future proofing of it. Taking things to the extreme, imagine that the
> HDF spec supercedes the XML spec at some point. It seems inelegant to
> continue naming things "xml_".

Fair enough. We can call the two main nodes something like "Metadata"
and "Pixels."

> Another area where this comes to play is the restriction on a single
> pixels set in the file. For reasons of performance and maintainability
> it may make sense to also have derivative pixels within the same file.
> A prime example of this is subsampled images which can aid in
> visualization. This would lead to a structure something like:
>
>  /Pixels/Acquired /Pixels/SomeDerivationMethod
>  /Pixels/SomeOtherDerivationMethod /Pixels/Subsample1
>  /Pixels/Subsample2
>
> with the first Pixels always being the default for a single file. This
> also may be of importance in the case where the original data is
> compressed. Assuming a client has compressed an OME-HDF file for
> transport, the server might then want to decompress all or portions of
> the data for faster access:
>
>  /Pixels/Uncompressed
>
> in which case it might also be nice to have a required field on the
> acquired image specifying whether or not it is at any point in its
> structure compressed or incomplete.

First of all, to be clear, there is no restriction on pixels sets within
a file. Correct me if I'm wrong, but both our proposal and your example
allow multiple sets of pixels.

That said, your comments make sense. You are essentially providing a
layer of nodes beneath Pixels for categorization purposes, which makes
the isomorphism a little more complicated, but with defined rules would
work fine.

In general, the main danger I see is designing something that can be
expressed in HDF but not in the current OME-XML schema, but we can
easily amend the schema to address such issues.

> Another option would be to use the HDF filters
> (http://hdf.ncsa.uiuc.edu/HDF5/doc/Filters.html) for transparent
> compression, though this doesn't help with incomplete data.

We will have to investigate the details, but tentatively is an
interesting idea. Do you know where the filter code gets stored? It
seems like a tricky issue, especially if we want to define our own
filters.

As a side note, it looks like the HDF site has been reorganized. The new
Filters URL is: http://hdfgroup.org/HDF5/doc/H5.user/Filters.html

> Finally, to the category "Other topics" from the spec here are some
> other other topics to consider:
>
>  * What are "pixels group nodes"?

We are not clear what you mean by this question.

> * views on pixels : It is possible to have views (via references or
>   named dataspaces) within an HDF file. So, one could imagine some
>   interesting scenarios. A vendor could take an OME-HDF file and
>   create views for using it via their own client; or that some
>   library like bio-formats could "normalize" a file for use as
>   OME-HDF. (See http://hdf.ncsa.uiuc.edu/HDF5/doc/References.html for
>   example) To make this possible, it might make sense to put
>   everything in a top /OME group similr to XML. This allows other
>   vendors to define their own space, which can either be handled via
>   the "Namespace" table in the example below, or via iana.org-like
>   reservations. But in general, we should consider allowing extra
>   data in a single file.  (Relatedly, it would probably be necessary
>   to have CA and STD as separate top-level namespaces with pointers
>   into /OME, since the data structures are static)

We do not initially see any problems with this approach, and agree that
OME data should be able to coexist with non-OME data. I'm not sure what
you mean about the CA and STD namespaces, though.

> * checksums : There is currently a proposal albeit old in the HDF5
>   community for file checksums. We should most likely consider either
>   taking part in that proposal or at least implementing something
>   similar in our own format. Possibilities range from providing a
>   checksum for the final data structure when/if complete, or
>   (optionally?) providing checksums for any binary blocks.
>
> http://www.hdfgroup.org/HDF5/doc_resource/H5Checksum/ChecksumProposal.htm
>   http://www.hdfgroup.org/HDF5/doc_resource/H5Checksum/EDC_spec.htm

Fine. Andrew recently added a checksum feature to the OME-XML
specification--easy to do something similar for HDF.

> * HDF provides the ability to link to external files, which might
>   allow for a "stitching" together of existing files for certain
>   vendors. This negates the benefit of having a single-file format,
>   but might should at least be looked at.

Worth mentioning, sure. It is less important than with OME-TIFF, though.

> * Pixel(-Dimension) ordering : Is it useful in terms of performance
>   (storage, retrieval, fewer data transformations) to allow some
>   limited number of alternate byte-ordering (rasterizing)?

The dimension order issue is deceptively complex, and should probably be
discussed more thoroughly. We will need to settle on the tree structure,
as we were discussing above, to finalize this.

What sorts of use cases are you envisioning when you say "alternate
byte-ordering"? Do you mean storing the same pixels set with multiple
different orderings, for performance reasons? Seems reasonable to allow
this.

> * Chunking : There was no mention of "chunking", the HDF term for
>   tiling. I know this is of interest to several groups, though I'm not
>   sure if it has to be specified since it is handled by the
>   library. However, bringing it to the attention of the potential
>   end-user is probably a good idea. (For example a client would want
>   to know on what boundaries data is chunked to maximize read times.)

Sure, it can be mentioned in the "other topics" section that there is no
reason binary blocks cannot be chunked.

> * Locking, read-only, read-write : each HDF file is an entire
>   file-system for all intents and purposes, and though difficult to
>   enforce, it might be required to provide  "suggestions" in addition
>   to what the OS provides.

One way to enforce such things could be to use FUSE to mount OME-HDF
files as filesystems. We were recently at a file formats meeting where
this was discussed, and the implications are exciting. Here is Matt
Dougherty's abstract:

http://www.medsbio.org/meetings/BNL_May08_imgCIF_Workshop_files/Dougherty_Abstract.pdf

> Alright. That was a lot of specifics for an initial email, but like I
> said, we're just very excited to have this underway. Thanks for making
> that happen!

Sure, and hopefully we can make some more progress at the developer
meeting.

-Curtis


On Thu, Dec 13, 2007 at 3:00 PM, <josh.moore at gmx.de> wrote:
> Curtis Rueden writes:
> > Hi Aaron and everyone,
> >
> > Attached is the latest draft of our OME-HDF proposal. Feedback is
> > welcome.
> >
> > One notable thing that needs improvement is the lack of detail on
> > the Pixels XML node(s) in the xml_root tree. The base64 format uses
> > <BinData> tags, and OME-TIFF uses <TiffData> tags -- should OME-HDF
> > just include an empty <HDFData> tag? Is doing so even necessary,
> > since the XML has been reformatted as an HDF group tree? Any
> > thoughts and opinions are appreciated.
> >
> > Thanks, Curtis
>
> Hi Curtis.
>
> Thanks for sending this around. It's great to have something concrete
> to compare our needs against. In our case that includes the
> possibility of representing the internal format that will be used by
> the server for sending both raw and rendered data to clients. The
> primary requirement for that is that a pixels representation can be
> chosen which optimizes for speed. It seems like the dimensions
> "short-circuiting" -- allowing a blob to be put at any point in the
> pixels tree -- is an ideal solution.
>
> Regarding the metadata, however, my naive approach would have been
> quite different, but basically centers around using HDF datatypes to
> the fullest. Encoding dimensions as a single string is one example of
> something that can be done straight-forwardly in HDF. Though maybe I'm
> missing something, since I don't understand the purpose of the
> 1-dimensional dataset (0-dimensional?) "42" under your dimensions
> specifier.
>
> In general I think I'd do something less like XML. Obviously we need
> to maintain an isomorphism to OME-XML so that a conversion can be
> performed for any tool that needs to work on one rather than the
> other, but it seems that we could make the mapping a bit less
> explicit. I know that's how the OME-TIFF implementation works, but it
> seems OME-HDF is significantly different.
>
> For starters, it has actual highly-performant ways of encoding all the
> types which are represented as string in XML. For anyone who doesn't
> already have code which works with our OME-XML, having direct access
> to the values could be beneficial. Not just that, barring the lack of
> foreign key constraints, one could conceivably dump an entire database
> to a single, massive OME-HDF file, which is an interesting if extreme
> use case.
>
> I might just be wrong, which is fine, but the benefit of strings like
> "xmlns:CA:http://..." are that there are libraries to parse them. For
> HDF there aren't. I'm open to the need for a namespace concept, but
> perhaps it can be in an HDF-metaphor, not just XML in HDF.
>
> Which should be doable right? HDF is intended as a self-documenting
> format like XML. It probably makes sense then to make the naming and
> grouping a little less terse. And instead of naming each group with a
> unique id, one could also use an array, which brings me to the
> somewhat quirky files attached below.  This is what I was referring to
> on Teamspeak, Curtis. Basically I tried to take a very small subset of
> the model elements and attributes and encode them as efficiently as
> possible. That's probably not the actual goal, but it makes for a good
> comparison.
>
> On another note, if this is a specification that is truly adopted
> throughout the community, it'd be nice go as far as we can with the
> future proofing of it. Taking things to the extreme, imagine that the
> HDF spec supercedes the XML spec at some point. It seems inelegant to
> continue naming things "xml_".
>
> Another area where this comes to play is the restriction on a single
> pixels set in the file. For reasons of performance and maintainability
> it may make sense to also have derivative pixels within the same file.
> A prime example of this is subsampled images which can aid in
> visualization. This would lead to a structure something like:
>
>  /Pixels/Acquired
>  /Pixels/SomeDerivationMethod
>  /Pixels/SomeOtherDerivationMethod
>  /Pixels/Subsample1
>  /Pixels/Subsample2
>
> with the first Pixels always being the default for a single file. This
> also may be of importance in the case where the original data is
> compressed. Assuming a client has compressed an OME-HDF file for
> transport, the server might then want to decompress all or portions of
> the data for faster access:
>
>  /Pixels/Uncompressed
>
> in which case it might also be nice to have a required field on the
> acquired image specifying whether or not it is at any point in its
> structure compressed or incomplete. Another option would be to use the
> HDF filters (http://hdf.ncsa.uiuc.edu/HDF5/doc/Filters.html) for
> transparent compression, though this doesn't help with incomplete
> data.
>
> Finally, to the category "Other topics" from the spec here are some
> other other topics to consider:
>
>  * What are "pixels group nodes"?
>
>  * views on pixels : It is possible to have views (via references or
>    named dataspaces) within an HDF file. So, one could imagine some
>    interesting scenarios. A vendor could take an OME-HDF file and
>    create views for using it via their own client; or that some
>    library like bio-formats could "normalize" a file for use as
>    OME-HDF. (See http://hdf.ncsa.uiuc.edu/HDF5/doc/References.html for
>    example) To make this possible, it might make sense to put
>    everything in a top /OME group similr to XML. This allows other
>    vendors to define their own space, which can either be handled via
>    the "Namespace" table in the example below, or via iana.org-like
>    reservations. But in general, we should consider allowing extra
>    data in a single file. (Relatedly, it would probably be necessary
>    to have CA and STD as separate top-level namespaces with pointers
>    into /OME, since the data structures are static)
>
>  * checksums : There is currently a proposal albeit old in the HDF5
>    community for file checksums. We should most likely consider either
>    taking part in that proposal or at least implementing something
>    similar in our own format. Possibilities range from providing a
>    checksum for the final data structure when/if complete, or
>    (optionally?) providing checksums for any binary blocks.
>
> http://www.hdfgroup.org/HDF5/doc_resource/H5Checksum/ChecksumProposal.htm
> http://www.hdfgroup.org/HDF5/doc_resource/H5Checksum/EDC_spec.htm
>
>  * HDF provides the ability to link to external files, which might
>    allow for a "stitching" together of existing files for certain
>    vendors. This negates the benefit of having a single-file format,
>    but might should at least be looked at.
>
>  * Pixel(-Dimension) ordering : Is it useful in terms of performance
>    (storage, retrieval, fewer data transformations) to allow some
>    limited number of alternate byte-ordering (rasterizing)?
>
>  * Chunking : There was no mention of "chunking", the HDF term for
>    tiling. I know this is of interest to several groups, though I'm
>    not sure if it has to be specified since it is handled by the
>    library. However, bringing it to the attention of the potential
>    end-user is probably a good idea. (For example a client would want
>    to know on what boundaries data is chunked to maximize read times.)
>
>  * Locking, read-only, read-write : each HDF file is an entire
>    file-system for all intents and purposes, and though difficult to
>    enforce, it might be required to provide  "suggestions" in addition
>    to what the OS provides.
>
> Alright. That was a lot of specifics for an initial email, but like I
> said, we're just very excited to have this underway. Thanks for making
> that happen!
>
> Cheers,
> ~Josh.
>
>
> + / (root)
> |-+ OME
>  |-+ Metadata
>  | |-+ 1e_OME
>  |   |-+ 2a_xmlns
>  |   |  \- http://www.openmicroscopy.org/Schemas/OME/2007-06
>  |   |-+ 3a_xmlns:CA
>  |   |  \- http://www.openmicroscopy.org/Schemas/CA/2007-06
>  |   |-+ 4c
>  |   |  \- example
>  |   ...
>  |-+ Pixels
>    |-+ Acquired
>    | |-+ Id
>    | |  \- urn:lsid:example.com:Pixels:01
>    | |-+ Dimensions
>    | | |+ X
>    | | | \- 3
>    | | |+ Y
>    | | | \- 9
>    | | ...
>    |-+ Uncompressed
>    | |-+ Id
>    | |  \- urn:lsid:example.com:Pixels:02
>    ...
>    |-+ urn:lsid:example.com:Pixels:01
>      |-+ 5d_p:2,c:16,t:16,z:1,y:240,x:320
>      |  \- 42
>      |-+ 6p_p:0
>      | |-+ 7p_c:0
>      | | |-+ 8p_t:0
>      | |   |-+ 9p_z:0
>      | |      \- (a 2d dataset containing x and y)
>      | |-+ 10p_c:1
>      |   |-+ 11p_t:0
>      |     |-+ 12c_LZW
>      |        \- (a 3d dataset containing x, y, and z, LZW compressed)
>      |-+ 13p_p:1
>
>
> > On Nov 7, 2007 2:41 AM, Ponti, Aaron <aaron.ponti at fmi.ch> wrote:
> > > Hi Curtis
> > >
> > > Thanks a lot: your effort is very good news for us. As you
> > > suggested, I would indeed like to get a few more details on the
> > > OME-HDF specifications from Eric, if possible.
> > >
> > > There are several reasons for us to push the HDF way: possibly the
> > > most important one is that we depend on several vendors for our
> > > facility -- both from the hardware and the software side. Over the
> > > last few months there has been a slight increase in the support of
> > > the OME-XML model (or at least __some__ XML model that could be
> > > pushed in direction OME, e.g. Leica, Zeiss) and (independently) of
> > > HDF5 (most notably Bitplane's Imaris and The MathWorks's MATLAB).
> > > With a good push we might get them to converge.
> > >
> > > Another reason is that some solutions that create thousands of
> > > files per dataset are simply too cumbersome to handle (and store).
> > > On the other hand, (multi-page) TIFF does not lend itself for the
> > > new trend of "huge datasets".
> > >
> > > A third one is that we are trying to push an image-processing
> > > platform among our fellows (based on existing tools) that will
> > > rely on a "standard" file format that does not suffer from the
> > > drawbacks of our current (imposed) options and also gives us more
> > > flexibility for possibly storing derived data (this last point is
> > > still in open debate).
> > >
> > > To answer Jason's question: we are willing to help of course, but
> > > I think LOCI's effort is where we would start from.
> > >
> > > Aaron
> > >
> > >
>  ----------------------------------------------------------------------
> > > | Dr. Aaron C. Ponti
> > > | Friedrich Miescher Institute
> > > | Facility for Advanced Microscopy and Imaging
> > > | Software development
> > > | Maulbeerstrasse 66 CH-4058, Basel
> > > | WRO-1066.2.32
> > > | Tel: +41 61 697 3012
> > > | Fax: +41 61 697 3976
> > > | http://www.fmi.ch/faim
> > >
>  ----------------------------------------------------------------------
> > >
> > >
> > > -----Original Message-----
> > > From: Curtis Rueden
> > > Sent: Monday, November 05, 2007 4:15 PM
> > > To: Aaron Ponti
> > > Subject: Re: OME-HDF5
> > >
> > > Hi Aaron,
> > >
> > > > Last spring (at the OME meeting in Paris) we discussed the
> > > > possible use of the HDF5 file container for a next incarnation
> > > > of the OME-XML or OME-TIFF formats. You told us that this was a
> > > > direction you guys at LOCI were already investigating.
> > >
> > > We have been investigating the idea of an OME-HDF file format for
> > > the past several months, and have a rough specification in mind
> > > for it. The basic idea is to represent the XML metadata as an HDF5
> > > tree structure, and also store the pixels in a tree structure that
> > > branches according to the data's dimensional axes. The leaves
> > > could be individual image planes, or blocks of image planes across
> > > one or more dimensions, depending on the use case and compression
> > > scheme chosen. If you want more technical details, Eric can
> > > elaborate further.
> > >
> > > > Although we thing OME-XML is appropriate for storing the
> > > > metadata associated with the experiments, we still have some
> > > > concerns regarding the appropriateness of using the standard
> > > > text-based OME-XML file format or the also somewhat limited
> > > > OME-TIFF alternative to store the actual pixel data. For this
> > > > reason we are looking for an alternative and the HDF5 file
> > > > container seems to be a promising candidate.
> > >
> > > Could you describe your group's use case for HDF a little more?
> > > What is it about HDF that you like that TIFF cannot provide for
> > > you as effectively? I want to make sure that the format ultimately
> > > meets your needs.
> > >
> > > At LOCI, we have been focusing primarily on the ability to
> > > represent the additional dimensions of lifetime, emission spectra,
> > > and polarization elegantly, though the design I described above is
> > > by no means limited to these specific attributes. We are also
> > > interested in applying compression schemes (both lossy and
> > > lossless) that take advantage of redundancy across dimensions,
> > > such as multidimensional wavelet transforms, and OME-HDF is being
> > > designed to accommodate this goal.
> > >
> > > > I of course see the irony of complaining about the vendors
> > > > creating new file formats and at the same time proposing a new
> > > > format myself
> > >
> > > I think it is less difficult for vendors to embrace a standard
> > > from a non-corporate entity than it would be from a competitor.
> > > Nonetheless, for them to do so will require a strong message from
> > > customers.
> > >
> > > > these days there are more and more commercial and open-source
> > > > tools that are starting to use HDF5 (obviously with different
> > > > internal structures) as their "native" file format, and it might
> > > > be easier to push for a standard if people are already walking
> > > > the same (or at least a parallel) path.
> > >
> > > There is quite a lot of momentum for HDF in microscopy recently,
> > > and HDF looks like the best candidate for a general-purpose, open
> > > container format. One technical limitation we are currently
> > > struggling with is the fact that while there is good library
> > > support for HDF5 in C/C++, and a good Java reader implementation
> > > in the netCDF4 libraries, we know of no pure Java HDF writer
> > > implementation. This complicates our Bio-Formats effort, as we
> > > will either need to write such code ourselves, or use a Java
> > > wrapper over native code, which would restrict deployment of
> > > OME-HDF writer functionality to supported platforms. Given the
> > > limited time we have available, our current plan is the native
> > > code solution, at least until a Java-based HDF writer becomes
> > > available.
> > >
> > > That's about it for where OME-HDF currently stands. We welcome any
> > > comments or further questions.
> > >
> > > -Curtis
> > >
> > >
> > > On 11/5/07, Jason Swedlow <jason at lifesci.dundee.ac.uk> wrote:
> > > > Hi Aaron-
> > > >
> > > > Thanks for your interest.  HDF does certainly have some
> > > > interesting features, and we have been looking at it.
> > > >
> > > > Most importantly, tell us how interested you are, and if you can
> > > > help. if so, then great.  if not, that is fine, but we have alot
> > > > going on, and we will get to this-- we are trying to get funding
> > > > now.  Any help would be most appreciated.
> > > >
> > > > There is general agreement on the project that supporting
> > > > multiple binary formats that include OME-XML is the way to go.
> > > > there is no reason to be religious about TIFF, especially if HDF
> > > > has good library support.
> > > >
> > > > Cheers,
> > > >
> > > > Jason
> > > >
> > > >
> > > > On 2 Nov 2007, at 12:31, Ponti, Aaron wrote:
> > > >
> > > > > Dear Curtis
> > > > >
> > > > > Last spring (at the OME meeting in Paris) we discussed the
> > > > > possible use of the HDF5 file container for a next incarnation
> > > > > of the OME-XML or OME-TIFF formats. You told us that this was
> > > > > a direction you guys at LOCI were already investigating.
> > > > >
> > > > > We are a microscopy and imaging facility located in Basel,
> > > > > Switzerland, and we are trying to push the OME XML scheme as a
> > > > > standard among our fellow facilities and labs around
> > > > > Switzerland and Southern Germany. The goal is to build the
> > > > > critical mass needed to make pressure on the microscopy
> > > > > vendors to stop creating new file formats every second day.
> > > > >
> > > > > Although we thing OME-XML is appropriate for storing the
> > > > > metadata associated with the experiments, we still have some
> > > > > concerns regarding the appropriateness of using the standard
> > > > > text-based OME-XML file format or the also somewhat limited
> > > > > OME-TIFF alternative to store the actual pixel data. For this
> > > > > reason we are looking for an alternative and the HDF5 file
> > > > > container seems to be a promising candidate.
> > > > >
> > > > > I of course see the irony of complaining about the vendors
> > > > > creating new file formats and at the same time proposing a new
> > > > > format myself, but these days there are more and more
> > > > > commercial and open-source tools that are starting to use HDF5
> > > > > (obviously with different internal structures) as their
> > > > > "native" file format, and it might be easier to push for a
> > > > > standard if people are already walking the same (or at least a
> > > > > parallel) path.
> > > > >
> > > > > What do you think? Are you guys still looking into a possible
> > > > > OME-HDF5 file format? Do you think what I am saying makes some
> > > > > sense?
> > > > >
> > > > > Thanks a lot for any feedback
> > > > > Aaron


On Wed, May 9, 2012 at 10:56 PM, Jason Swedlow wrote:
> Hi Stephan-
>
> Thanks for this comment.
>
> Indeed, Bio-Formats has supported HDF5 formats for a few years.
> Imaris' .ims format uses HDF5, and Bio-Formats reads this format. This
> work was performed based on a good set of examples files, which we
> supported.
>
> The EMBL group has released support for HDF5 through R, especially for
> HCS data:
>
> http://www.bioconductor.org/packages/devel/bioc/html/rhdf5.html
>
> Moreover, Sorger's group has published a specification (called
> "SDCubes") for a pair of HDF5 and XML files that provide a data vessel
> and specification:
>
> http://www.nature.com/nmeth/journal/v8/n6/full/nmeth.1600.html
>
> As you imply, there are others....
>
> In OME, we're most interested in enabling access to these (and other,
> of course!) new formats. We always work with example data-- sets of
> files that comprehensively represent the data stored in these file
> formats. So, if you have examples of well-used format stored in HDF5,
> send us the data and we'll add it to the list of files we are working
> on.
>
> We are also interested in mechanisms to transport files between
> different users and applications-- that's what OME-TIFF is all about:
>
> http://www.openmicroscopy.org/site/support/file-formats/ome-tiff
>
> OME-TIFF is based on great libraries for reading and writing TIFF and
> XML. These are well-established and available to all for use.
>
> HDF5 might certainly be a way to transport metadata, analytics, etc.
> But there are a large number of different types of data that could be
> stored in an HDF5 file, including images, metadata, analytics,
> annotations, etc. If a common, well-defined and performant
> specification emerges for this, we will support it. We'd emphasise
> that this will require not just a good spec, but also good, supported,
> maintained, cross-platform software libraries for reading and writing
> these data types, and good, well-worked example files.
>
> Thanks again for the note.
>
> Cheers,
>
> Jason
>
> Jason Swedlow | Wellcome Trust Centre for Gene Regulation & Expression |
> Open Microscopy Environment | University of Dundee
>
>
> On Wed, May 9, 2012 at 4:44 AM, Stephan Gerhard wrote:
> > Hi,
> >
> > More and more software is adopting HDF5 as an efficient storage
> > mechanism for image data.
> >
> > I'd like to know what the current plans are in supporting HDF5 I/O
> > for bioimage dataset, e.g. through Bio-Formats. For instance, I
> > would like to have an easy way to save 4D or 5D images from
> > ImageJ/Fiji to HDF5 for further processing.
> >
> > Thanks for the update,
> > Stephan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.png
Type: image/png
Size: 109260 bytes
Desc: not available
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20120511/c458a058/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.py
Type: application/octet-stream
Size: 4780 bytes
Desc: not available
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20120511/c458a058/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.h5
Type: application/octet-stream
Size: 125828 bytes
Desc: not available
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20120511/c458a058/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omehdf-2007-12-11.pdf
Type: application/pdf
Size: 131092 bytes
Desc: not available
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20120511/c458a058/attachment-0001.pdf>


More information about the ome-devel mailing list