[ome-devel] BinaryOnly Element (Was: ome-devel Digest, Vol 81, Issue 5)
Rubén Muñoz
ruben.munoz at embl.de
Fri Dec 17 20:41:06 GMT 2010
Hello to all,
We attach a joined support letter that, given the close end of the year, has taken more than expected.
Happy new year!
Rubén
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EMBL Bioformats support letter.pdf
Type: application/pdf
Size: 32774 bytes
Desc: not available
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20101217/8be82d60/attachment-0001.pdf>
-------------- next part --------------
On Dec 17, 2010, at 5:27 PM, Josh Moore wrote:
> All,
>
> Hopefully in time for everyone to mull over during the holiday season, the OME team has put together an initial, simple proposal for a new top-level element, "BinaryOnly", which though not yet published represents 1) outlined by Chris and option (3) from Rubén (see below):
>
> <ex1-binaryonly.zip>
>
> The proposal doesn't yet contain many of the other suggestions made in this thread (master/slave, wildcards, etc.), but should alleviate the most substantial issues of metadata duplication that are occurring while the community discusses future modifications.
>
> As always, we look forward to your feedback, and wish everyone a safe and happy end of 2010.
> ~Josh
>
> On Dec 9, 2010, at 9:55 PM, Chris Allan wrote:
>
>> Hi all,
>>
>> Based on our internal discussions, feedback from the community and our attempts to come up with a straight forward solution to this problem I think we, as the OME project, need to come to grips with the fact that preserving the "metadata with the data" dogma above all else isn't feasible. As a philosophy it was fine when we were not dealing with 5000 Image datasets and 10's of thousands of OME-TIFFs. We must be more flexible and we've already been more flexible in previous versions by allowing metadata-only OME-XML files.
>>
>> With that in mind the task outline is as follows:
>>
>> 1) Present to the community our preferred "companion file" strategy
>>
>> This will be in the form of a "preview" schema release designed to be forward compatible and examples on how to write OME-TIFF files that conform to the "companion file" strategy. This should allow those that are interested in the discussion surrounding how to deal with large multi-plane datasets to comment on a concrete example. Basically this is option (3) from Rubén's list below.
>>
>> We expect this to happen soon and we'll keep everyone apprised of our progress.
>>
>> 2) Make the schema options from (1) available via Bio-Formats
>>
>> There is already significant work that needs to be done to support large exports to OME-TIFF so this is in the pipeline but probably will not be released officially until OMERO / Bio-Formats 4.3.0 which is not expected until the spring. Since you have knowledge here about how you'd like things done and they have a pressing need this is certainly where we can really use people like Rubén's help. We'd be happy to take patches and input during this phase.
>>
>> 3) Make the schema options from (1) available via OMERO
>>
>> As (2) above but with further enhancements to support the enhanced export requirements of OMERO for 4.3.0.
>>
>>
>> While we're progressing on (1) we'll happily take on board alternative solutions and suggestions. Alessandro's ideas surrounding metadata replication we'll certainly take on board and see how easy they are to implement.
>>
>> The OME team appreciates everyone's feedback and interest greatly.
>>
>> Thanks.
>>
>> -Chris
>>
>> On 9 Dec 2010, at 12:14, Jason Swedlow wrote:
>>
>>> Agreed-- this is one of this things where there is no perfect strategy, just a best choice between a series of compromises. Presumably, file locking isn't an issue, or gets dealt with in the application Rubén has.
>>>
>>> Rubén, usually we don't just issue a specification, saying, more or less, "Do it this way", but also build and release the software that supports the specification. That's important as it usually reveals whether the modeling is correct, results in something relatively performant, etc. In many cases, the model defines how the software is built.
>>>
>>> We have our weekly planning mtg this PM and will get back to you after that.
>>>
>>> Cheers,
>>>
>>> Jason
>>>
>>>
>>> On 8 Dec 2010, at 19:57, Curtis Rueden wrote:
>>>
>>>> Hi Alessandro,
>>>>
>>>> Based on your experience how much increase in size we could expect from a "one in ten", or "one in hundred" files with metatada redundancy ? I think that some estimations would be of great help in order to better understand what could be the impact of this implementation on the IT Departments and HCS Facilities operations.
>>>>
>>>> From what Rubén told me, a typical situation might be 2.5MB of binary data (pixels) per TIFF file, and 5.5MB of OME-XML. Over 23,000 TIFF files, that's 180GB when stored with metadata in every file, but only 56GB if the metadata is stored once only?more than 3X difference. Storing the metadata in 1/10th of the TIFFs would require ~69GB of storage, which amounts to nearly 13GB of wasted disk. Storing the metadata in 1/100th of the TIFFs would require ~57GB, wasting a mere 1GB of disk.
>>>>
>>>> To be clear, I think it is fine to adopt such a strategy, but my point is that it should be the institution's choice. With the master/slave proposal, it would be totally configurable how often to replicate the OME-XML metadata. You could store the metadata for one file only, for all files, or for some subset as you propose.
>>>>
>>>> -Curtis
>>>>
>>>> On Wed, Dec 8, 2010 at 12:59 PM, Alessandro Dellavedova <alessandro.dellavedova at ifom-ieo-campus.it> wrote:
>>>> Hi Curtis and Rubén,
>>>>
>>>> On Dec 8, 2010, at 5:51 PM, Curtis Rueden wrote:
>>>>
>>>>> Alessandro wrote:
>>>>> Does it make sense to add a level of redundancy like, for example, one in ten files has to carry the complete headers, in order to avoid the loss of metadata info if the master file got deleted/corrupted/abducted by aliens ?
>>>>>
>>>>> For large numbers of files, I think any mandated level of redundancy will still result in an undesirable increase in size.
>>>>
>>>> Based on your experience how much increase in size we could expect from a "one in ten", or "one in hundred" files with metatada redundancy ? I think that some estimations would be of great help in order to better understand what could be the impact of this implementation on the IT Departments and HCS Facilities operations.
>>>>
>>>> Sorry if I ask this kind of obvious questions, but in Q1 2011 we will setup an HCS Facility here at our Campus and I'll be the person that has to deploy the IT infrastructure (storage/HPC) needed to run the Facility, OMERO will be playing a key role in this scenario, so I'm basically learning here in preparation of the deployment.
>>>>
>>>> Thanks for your time and kind understanding,
>>>>
>>>> Alessandro
>>>>
>>>>>
>>>>> -Curtis
>>>>>
>>>>> On Wed, Dec 8, 2010 at 9:00 AM, Alessandro Dellavedova <alessandro.dellavedova at ifom-ieo-campus.it> wrote:
>>>>> Hi Rubén and list,
>>>>>
>>>>>> Some options to simplify the format have ben discussed as follows:
>>>>>>
>>>>>> - The master/slave approach. All files will reference the one that contains the complete headers.
>>>>>
>>>>> Does it make sense to add a level of redundancy like, for example, one in ten files has to carry the complete headers, in order to avoid the loss of metadata info if the master file got deleted/corrupted/abducted by aliens ?
>>>>>
>>>>> Best,
>>>>>
>>>>> Alessandro
>>>>>
>>>>> _______________________________________________
>>>>> ome-devel mailing list
>>>>> ome-devel at lists.openmicroscopy.org.uk
>>>>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>>>>
>>>>> On Wed, Dec 8, 2010 at 6:58 AM, Rubén Muñoz <ruben.munoz at embl.de> wrote:
>>>>> Hi Andrew and list subscribers,
>>>>>
>>>>> I have some comments to add regarding the OME.TIFF and OME.XML requirements for changes. The current description of our issue is:
>>>>>
>>>>> * EMBL Screening (Ruben Muñoz, Jan Ellenberg)
>>>>>
>>>>> * Not duplicating XML for each field, _plane_, etc.
>>>>> I would like to add that our use case, will apply to each user of the OME.TIF multi-file export option.
>>>>>
>>>>> We previously pointed out that the number of planes that are stored per OME-TIFF has a big impact in each file's size. For multi-file datasets, the conversion output will be exponentially bigger than the raw data.
>>>>>
>>>>> At EMBL-Heidelberg HCS Facility, we have used this as internal standard, with the pre-requisite of having one single plane per file.
>>>>> The reasons to do that can be summarized: gives maximum compatibility with software for image processing, online control of the microscope and visualization, even after instrument/power failure. This software includes in-house developments: CellCognition, Micropilot, Cellbase and 3rd-party projects: CellProfiler, Image J/FIJI.
>>>>>
>>>>> Given this scenario we found OME.TIF convenient because it has the correct conversion tools and an evolving metadata structure, in addition the commercial adoption of the format is growing.
>>>>>
>>>>> In the practice, a lot of the metadata consist in "<Plate>", "<Image>" and "<Pixel>" elements (describing the SPW, dimensionally and the references to the files in the set).
>>>>>
>>>>> That can be prohibitive at the processing and the storage stage.
>>>>> Some options to simplify the format have ben discussed as follows:
>>>>>
>>>>> - The master/slave approach. All files will reference the one that contains the complete headers.
>>>>> - "<Plate>", "<Image>" and "<Pixel>" elements could be grouped when similar (e.g. reg. expressions following a pattern)
>>>>> - The "<Plate>", "<Image>" and "<Pixel>" could be extracted to a separate file.
>>>>>
>>>>> The first alternative was supported by Andrew. I suggested the second, but the project philosophy is opposite to the third.
>>>>>
>>>>> Are there other suggestions? I would like to keep this discussion open and to help to define more details if needed.
>>>>>
>>>>> Best,
>>>>> Rubén
>>>>>
>>>>> On Dec 7, 2010, at 4:03 PM, ome-devel-request at lists.openmicroscopy.org.uk wrote:
>>>>>>
>>>>>> Date: Tue, 7 Dec 2010 13:07:49 +0000
>>>>>> From: Andrew Patterson <ajpatterson at lifesci.dundee.ac.uk>
>>>>>> To: ome-devel at lists.openmicroscopy.org.uk,
>>>>>> ome-users at lists.openmicroscopy.org.uk
>>>>>> Subject: [ome-devel] OME-XML Updates
>>>>>> Message-ID:
>>>>>> <B5B2766B-2357-40C1-B1DD-06CCEC3A62C9 at lifesci.dundee.ac.uk>
>>>>>> Content-Type: text/plain; charset=us-ascii
>>>>>>
>>>>>> Hello OME-XML & OME-TIFF users and potential users,
>>>>>>
>>>>>> We are in the process of compiling requirements for changes to the way our OME-XML and OME-TIFF formats work. This is in response to the new ways people are wanting to use our formats, and drawbacks they have come across when storing datasets in certain circumstances.
>>>>>>
>>>>>> Examples we have so far include:
>>>>>> * storing large datasets, one plane per OME-TIFF: this is a valid way to want to store data, but one which at the moment causes metadata duplication on disk.
>>>>>> * creating a 'lite' OME-TIFF for display or to pass to external applications.
>>>>>>
>>>>>> A full list of our current thoughts is on the requirement ticket:
>>>>>> http://trac.openmicroscopy.org.uk/omero/ticket/3535
>>>>>>
>>>>>> Some of these changes may effect key features of our formats, e.g. our current insistence that all matadata is stored in the same file as the image data.
>>>>>>
>>>>>> We would really like to have your input on this feature, or any others.
>>>>>>
>>>>>> If you have a use case that you think would help guide out future work we would love to hear from you. If you can reply on either of the mailing lists (OME-USER or OME-DEVEL), it will let others see and join in!
>>>>>>
>>>>>> Thanks again for your help and support.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Andrew
>>>>>>
>
More information about the ome-devel
mailing list