[ome-devel] ome-devel Digest, Vol 81, Issue 5
Jason Swedlow
jason at lifesci.dundee.ac.uk
Thu Dec 9 12:14:19 GMT 2010
Agreed-- this is one of this things where there is no perfect
strategy, just a best choice between a series of compromises.
Presumably, file locking isn't an issue, or gets dealt with in the
application Rubén has.
Rubén, usually we don't just issue a specification, saying, more or
less, "Do it this way", but also build and release the software that
supports the specification. That's important as it usually reveals
whether the modeling is correct, results in something relatively
performant, etc. In many cases, the model defines how the software is
built.
We have our weekly planning mtg this PM and will get back to you after
that.
Cheers,
Jason
On 8 Dec 2010, at 19:57, Curtis Rueden wrote:
> Hi Alessandro,
>
> Based on your experience how much increase in size we could expect
> from a "one in ten", or "one in hundred" files with metatada
> redundancy ? I think that some estimations would be of great help in
> order to better understand what could be the impact of this
> implementation on the IT Departments and HCS Facilities operations.
>
> From what Rubén told me, a typical situation might be 2.5MB of
> binary data (pixels) per TIFF file, and 5.5MB of OME-XML. Over
> 23,000 TIFF files, that's 180GB when stored with metadata in every
> file, but only 56GB if the metadata is stored once only—more than 3X
> difference. Storing the metadata in 1/10th of the TIFFs would
> require ~69GB of storage, which amounts to nearly 13GB of wasted
> disk. Storing the metadata in 1/100th of the TIFFs would require
> ~57GB, wasting a mere 1GB of disk.
>
> To be clear, I think it is fine to adopt such a strategy, but my
> point is that it should be the institution's choice. With the master/
> slave proposal, it would be totally configurable how often to
> replicate the OME-XML metadata. You could store the metadata for one
> file only, for all files, or for some subset as you propose.
>
> -Curtis
>
> On Wed, Dec 8, 2010 at 12:59 PM, Alessandro Dellavedova <alessandro.dellavedova at ifom-ieo-campus.it
> > wrote:
> Hi Curtis and Rubén,
>
> On Dec 8, 2010, at 5:51 PM, Curtis Rueden wrote:
>
> > Alessandro wrote:
> > Does it make sense to add a level of redundancy like, for example,
> one in ten files has to carry the complete headers, in order to
> avoid the loss of metadata info if the master file got deleted/
> corrupted/abducted by aliens ?
> >
> > For large numbers of files, I think any mandated level of
> redundancy will still result in an undesirable increase in size.
>
> Based on your experience how much increase in size we could expect
> from a "one in ten", or "one in hundred" files with metatada
> redundancy ? I think that some estimations would be of great help in
> order to better understand what could be the impact of this
> implementation on the IT Departments and HCS Facilities operations.
>
> Sorry if I ask this kind of obvious questions, but in Q1 2011 we
> will setup an HCS Facility here at our Campus and I'll be the person
> that has to deploy the IT infrastructure (storage/HPC) needed to run
> the Facility, OMERO will be playing a key role in this scenario, so
> I'm basically learning here in preparation of the deployment.
>
> Thanks for your time and kind understanding,
>
> Alessandro
>
> >
> > -Curtis
> >
> > On Wed, Dec 8, 2010 at 9:00 AM, Alessandro Dellavedova <alessandro.dellavedova at ifom-ieo-campus.it
> > wrote:
> > Hi Rubén and list,
> >
> > > Some options to simplify the format have ben discussed as follows:
> > >
> > > - The master/slave approach. All files will reference the one
> that contains the complete headers.
> >
> > Does it make sense to add a level of redundancy like, for example,
> one in ten files has to carry the complete headers, in order to
> avoid the loss of metadata info if the master file got deleted/
> corrupted/abducted by aliens ?
> >
> > Best,
> >
> > Alessandro
> >
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> > On Wed, Dec 8, 2010 at 6:58 AM, Rubén Muñoz <ruben.munoz at embl.de>
> wrote:
> > Hi Andrew and list subscribers,
> >
> > I have some comments to add regarding the OME.TIFF and OME.XML
> requirements for changes. The current description of our issue is:
> >
> > * EMBL Screening (Ruben Muñoz, Jan Ellenberg)
> >
> > * Not duplicating XML for each field, _plane_, etc.
> > I would like to add that our use case, will apply to each user of
> the OME.TIF multi-file export option.
> >
> > We previously pointed out that the number of planes that are
> stored per OME-TIFF has a big impact in each file's size. For multi-
> file datasets, the conversion output will be exponentially bigger
> than the raw data.
> >
> > At EMBL-Heidelberg HCS Facility, we have used this as internal
> standard, with the pre-requisite of having one single plane per file.
> > The reasons to do that can be summarized: gives maximum
> compatibility with software for image processing, online control of
> the microscope and visualization, even after instrument/power
> failure. This software includes in-house developments:
> CellCognition, Micropilot, Cellbase and 3rd-party projects:
> CellProfiler, Image J/FIJI.
> >
> > Given this scenario we found OME.TIF convenient because it has the
> correct conversion tools and an evolving metadata structure, in
> addition the commercial adoption of the format is growing.
> >
> > In the practice, a lot of the metadata consist in "<Plate>",
> "<Image>" and "<Pixel>" elements (describing the SPW, dimensionally
> and the references to the files in the set).
> >
> > That can be prohibitive at the processing and the storage stage.
> > Some options to simplify the format have ben discussed as follows:
> >
> > - The master/slave approach. All files will reference the one
> that contains the complete headers.
> > - "<Plate>", "<Image>" and "<Pixel>" elements could be grouped
> when similar (e.g. reg. expressions following a pattern)
> > - The "<Plate>", "<Image>" and "<Pixel>" could be extracted to a
> separate file.
> >
> > The first alternative was supported by Andrew. I suggested the
> second, but the project philosophy is opposite to the third.
> >
> > Are there other suggestions? I would like to keep this discussion
> open and to help to define more details if needed.
> >
> > Best,
> > Rubén
> >
> > On Dec 7, 2010, at 4:03 PM, ome-devel-request at lists.openmicroscopy.org.uk
> wrote:
> >>
> >> Date: Tue, 7 Dec 2010 13:07:49 +0000
> >> From: Andrew Patterson <ajpatterson at lifesci.dundee.ac.uk>
> >> To: ome-devel at lists.openmicroscopy.org.uk,
> >> ome-users at lists.openmicroscopy.org.uk
> >> Subject: [ome-devel] OME-XML Updates
> >> Message-ID:
> >> <B5B2766B-2357-40C1-B1DD-06CCEC3A62C9 at lifesci.dundee.ac.uk>
> >> Content-Type: text/plain; charset=us-ascii
> >>
> >> Hello OME-XML & OME-TIFF users and potential users,
> >>
> >> We are in the process of compiling requirements for changes to
> the way our OME-XML and OME-TIFF formats work. This is in response
> to the new ways people are wanting to use our formats, and drawbacks
> they have come across when storing datasets in certain circumstances.
> >>
> >> Examples we have so far include:
> >> * storing large datasets, one plane per OME-TIFF: this is a valid
> way to want to store data, but one which at the moment causes
> metadata duplication on disk.
> >> * creating a 'lite' OME-TIFF for display or to pass to external
> applications.
> >>
> >> A full list of our current thoughts is on the requirement ticket:
> >> http://trac.openmicroscopy.org.uk/omero/ticket/3535
> >>
> >> Some of these changes may effect key features of our formats,
> e.g. our current insistence that all matadata is stored in the same
> file as the image data.
> >>
> >> We would really like to have your input on this feature, or any
> others.
> >>
> >> If you have a use case that you think would help guide out future
> work we would love to hear from you. If you can reply on either of
> the mailing lists (OME-USER or OME-DEVEL), it will let others see
> and join in!
> >>
> >> Thanks again for your help and support.
> >>
> >> Cheers,
> >>
> >> Andrew
> >>
> >> --
> >> Andrew Patterson
> >> ajpatterson at lifesci.dundee.ac.uk
> >> Software Developer, Open Microscopy Environment
> >> Wellcome Trust Centre for Gene Regulation & Expression,
> University of Dundee
> >
> >
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> >
> >
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
**************************
Wellcome Trust Centre for Gene Regulation & Expression
College of Life Sciences
MSI/WTB/JBC Complex
University of Dundee
Dow Street
Dundee DD1 5EH
United Kingdom
phone (01382) 385819
Intl phone: 44 1382 385819
FAX (01382) 388072
email: jason at lifesci.dundee.ac.uk
Lab Page: http://gre.lifesci.dundee.ac.uk/staff/jason_swedlow.html
Open Microscopy Environment: http://openmicroscopy.org
**************************
The University of Dundee is a Scottish Registered Charity, No. SC015096.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20101209/d4b77600/attachment-0001.html>
More information about the ome-devel
mailing list