[ome-devel] ome-devel Digest, Vol 81, Issue 5

Curtis Rueden ctrueden at wisc.edu
Wed Dec 8 19:57:06 GMT 2010


Hi Alessandro,

Based on your experience how much increase in size we could expect from a
> "one in ten", or "one in hundred" files with metatada redundancy ? I think
> that some estimations would be of great help in order to better  understand
> what could be the impact of this implementation on the IT Departments and
> HCS Facilities operations.
>

>From what Rubén told me, a typical situation might be 2.5MB of binary data
(pixels) per TIFF file, and 5.5MB of OME-XML. Over 23,000 TIFF files, that's
180GB when stored with metadata in every file, but only 56GB if the metadata
is stored once only—more than 3X difference. Storing the metadata in 1/10th
of the TIFFs would require ~69GB of storage, which amounts to nearly 13GB of
wasted disk. Storing the metadata in 1/100th of the TIFFs would require
~57GB, wasting a mere 1GB of disk.

To be clear, I think it is fine to adopt such a strategy, but my point is
that it should be the institution's choice. With the master/slave proposal,
it would be totally configurable how often to replicate the OME-XML
metadata. You could store the metadata for one file only, for all files, or
for some subset as you propose.

-Curtis

On Wed, Dec 8, 2010 at 12:59 PM, Alessandro Dellavedova <
alessandro.dellavedova at ifom-ieo-campus.it> wrote:

> Hi Curtis and Rubén,
>
> On Dec 8, 2010, at 5:51 PM, Curtis Rueden wrote:
>
> > Alessandro wrote:
> > Does it make sense to add a level of redundancy like, for example, one in
> ten files has to carry the complete headers, in order to avoid the loss of
> metadata info if the master file got deleted/corrupted/abducted by aliens ?
> >
> > For large numbers of files, I think any mandated level of redundancy will
> still result in an undesirable increase in size.
>
> Based on your experience how much increase in size we could expect from a
> "one in ten", or "one in hundred" files with metatada redundancy ? I think
> that some estimations would be of great help in order to better  understand
> what could be the impact of this implementation on the IT Departments and
> HCS Facilities operations.
>
> Sorry if I ask this kind of obvious questions, but in Q1 2011 we will setup
> an HCS Facility here at our Campus and I'll be the person that has to deploy
> the IT infrastructure (storage/HPC) needed to run the Facility, OMERO will
> be playing a key role in this scenario, so I'm basically learning here in
> preparation of the deployment.
>
> Thanks for your time and kind understanding,
>
> Alessandro
>
> >
> > -Curtis
> >
> > On Wed, Dec 8, 2010 at 9:00 AM, Alessandro Dellavedova <
> alessandro.dellavedova at ifom-ieo-campus.it> wrote:
> > Hi Rubén and list,
> >
> > > Some options to simplify the format have ben discussed as follows:
> > >
> > >  - The master/slave approach. All files will reference the one that
> contains the complete headers.
> >
> > Does it make sense to add a level of redundancy like, for example, one in
> ten files has to carry the complete headers, in order to avoid the loss of
> metadata info if the master file got deleted/corrupted/abducted by aliens ?
> >
> > Best,
> >
> > Alessandro
> >
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> > On Wed, Dec 8, 2010 at 6:58 AM, Rubén Muñoz <ruben.munoz at embl.de> wrote:
> > Hi Andrew and list subscribers,
> >
> > I have some comments to add regarding the OME.TIFF and OME.XML
> requirements for changes. The current description of our issue is:
> >
> > * EMBL Screening (Ruben Muñoz, Jan Ellenberg)
> >
> >         * Not duplicating XML for each field, _plane_, etc.
> > I would like to add that our use case, will apply to each user of the
> OME.TIF multi-file export option.
> >
> > We previously pointed out that the number of planes that are stored per
> OME-TIFF has a big impact in each file's size. For multi-file datasets, the
> conversion output will be exponentially bigger than the raw data.
> >
> > At EMBL-Heidelberg HCS Facility, we have used this as internal standard,
> with the pre-requisite of having one single plane per file.
> > The reasons to do that can be summarized:  gives maximum compatibility
> with software for image processing, online control of the microscope and
> visualization, even after instrument/power failure. This software includes
> in-house developments: CellCognition, Micropilot, Cellbase and 3rd-party
> projects: CellProfiler, Image J/FIJI.
> >
> > Given this scenario we found OME.TIF convenient because it has the
> correct conversion tools and an evolving metadata structure, in addition the
> commercial adoption of the format is growing.
> >
> > In the practice, a lot of the metadata consist in "<Plate>", "<Image>"
> and "<Pixel>" elements (describing the SPW, dimensionally and the references
> to the files in the set).
> >
> > That can be prohibitive at the processing and the storage stage.
> > Some options to simplify the format have ben discussed as follows:
> >
> >  - The master/slave approach. All files will reference the one that
> contains the complete headers.
> >  - "<Plate>", "<Image>" and "<Pixel>" elements could be grouped when
> similar (e.g. reg. expressions following a pattern)
> >  - The "<Plate>", "<Image>" and "<Pixel>"  could be extracted to a
> separate file.
> >
> > The first alternative was supported by Andrew. I suggested the second,
> but the project philosophy is opposite to the third.
> >
> > Are there other suggestions? I would like to keep this discussion open
> and to help to define more details if needed.
> >
> > Best,
> > Rubén
> >
> > On Dec 7, 2010, at 4:03 PM,
> ome-devel-request at lists.openmicroscopy.org.uk wrote:
> >>
> >> Date: Tue, 7 Dec 2010 13:07:49 +0000
> >> From: Andrew Patterson <ajpatterson at lifesci.dundee.ac.uk>
> >> To: ome-devel at lists.openmicroscopy.org.uk,
> >>      ome-users at lists.openmicroscopy.org.uk
> >> Subject: [ome-devel] OME-XML Updates
> >> Message-ID:
> >>      <B5B2766B-2357-40C1-B1DD-06CCEC3A62C9 at lifesci.dundee.ac.uk>
> >> Content-Type: text/plain; charset=us-ascii
> >>
> >> Hello OME-XML & OME-TIFF users and potential users,
> >>
> >> We are in the process of compiling requirements for changes to the way
> our OME-XML and OME-TIFF formats work. This is in response to the new ways
> people are wanting to use our formats, and drawbacks they have come across
> when storing datasets in certain circumstances.
> >>
> >> Examples we have so far include:
> >> * storing large datasets, one plane per OME-TIFF: this is a valid way to
> want to store data, but one which at the moment causes metadata duplication
> on disk.
> >> * creating a 'lite' OME-TIFF for display or to pass to external
> applications.
> >>
> >> A full list of our current thoughts is on the requirement ticket:
> >> http://trac.openmicroscopy.org.uk/omero/ticket/3535
> >>
> >> Some of these changes may effect key features of our formats, e.g. our
> current insistence that all matadata is stored in the same file as the image
> data.
> >>
> >> We would really like to have your input on this feature, or any others.
> >>
> >> If you have a use case that you think would help guide out future work
> we would love to hear from you. If you can reply on either of the mailing
> lists (OME-USER or OME-DEVEL), it will let others see and join in!
> >>
> >> Thanks again for your help and support.
> >>
> >> Cheers,
> >>
> >> Andrew
> >>
> >> --
> >> Andrew Patterson
> >> ajpatterson at lifesci.dundee.ac.uk
> >> Software Developer, Open Microscopy Environment
> >> Wellcome Trust Centre for Gene Regulation & Expression, University of
> Dundee
> >
> >
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20101208/8f7d8a4b/attachment-0001.html>


More information about the ome-devel mailing list