[ome-devel] OMERO.features API development

Fri May 9 15:43:35 BST 2014

Hi all,

     I have been following the discussion on OMERO.features as it does 
fall quite quite close to issues we have and things we are doing as 
well, even though I did not participate. I thought I would chime in to 
say what we are up to in this domain; we'll be in Paris for the OME user 
meeting likely with a poster and happy to discuss...

     We are doing yeast genome wide HT/HC microscopy studies and up to 
now we've used a SQL database to store cell-wise feature extracted from 
images (as we are just one lab with coherent experiments -but a lot of 
data- it sounded like a good idea). We are currently looking into Neo4J 
as a more flexible solution, and working on a HTML5 front-end enabling 
visualisation and mining to the wider audience of experimental 
biologist. Although I haven't used it that deeply -yet-, I quite like 
neo4j as it does keep some structure to the data while not being as 
rigid as sql. And I guess we went the opposite way and stored all 
relevant information in the SQL/Neo4J DB, with the link with OMERO being 
provided by having the IDs of omero objects stored somewhere.

    Again, happy to talk in Paris about those things, which are indeed a 
common pain in the community...

Cheers
Anatole

On 08/05/2014 14:40, Simon Li wrote:
> Hi all
>
> I've started a Github repository for trying out some OMERO.features 
> ideas based on what I mentioned in the last email:
> https://github.com/manics/omero-features
>
> There's not a great deal in there at the moment. It's just saving 
> features into a local HDF5 file using Pytables, and example.py creates 
> a table similar to that used by Pyslid (OMERO.searcher). timings.txt 
> shows some rough run-times. Key-value row pairs are mapped to table 
> columns, however this means each row has to have the same 
> keys. There's no simple way to have a key-value map per column, for 
> now I'm just storing multiple features in one column.
>
> This is easily convertible to OMERO.tables, columns could be labelled 
> using OMERO annotations (in 5.1 there's a new MapAnnotation), though 
> it effectively means each group of features is stored separately and 
> thus would need to be queried separately. Alternatively an auxiliary 
> table could be used to store the per-column key-value pairs, similar 
> to how column descriptions are currently stored in OMERO.tables.
>
> A major limitation is that database joins between OMERO and a 
> feature-table aren't practical. For example, if each feature row is 
> labelled with an image ID, and you want to select a subset of rows 
> using an OMERO query, you have to pass a list of image IDs to the 
> Pytables query function which from my initial testing is very limited 
> in the number of parameters it'll handle (I get a stack overflow if 
> too many image IDs are passed).
>
> In practice this means you'd either need the feature table to contain 
> any metadata necessary for selecting rows (e.g. dataset ID, experiment 
> parameters, annotations) even if this means duplicating information 
> held in OMERO, or split the query up (very inefficient). This is 
> probably fine for people dealing with features in bulk where you might 
> download all features for a screen for offline processing, not so good 
> for real-time searching such as OMERO.searcher where you'd either need 
> to store everything you need for pre-filtering search results in the 
> table, or read all features and do the filtering afterwards.
>
> Probably OK as far as developing the API is concerned, but longer term 
> it suggests we need some other storage mechanism. Some of you will 
> remember Joaquin Correa from Paris last year. He's currently working 
> on his own feature storage implementation at LBL, so potentially this 
> is something we could look at for OMERO, and of course there are many 
> other possibilities.
>
> Simon
>
>
>
> On 24 Apr 2014, at 12:57, Lee Kamentsky <leek at broadinstitute.org 
> <mailto:leek at broadinstitute.org>> wrote:
>
>> Hi all,
>> Just chiming in, since we were mentioned...
>>
>> On Wed, Apr 23, 2014 at 5:10 PM, Simon Li <s.p.li at dundee.ac.uk 
>> <mailto:s.p.li at dundee.ac.uk>> wrote:
>>
>>     Hi all
>>
>>     Now that OMERO 5.0 is out of the way, and OMERO.searcher and
>>     WND-CHRM are either released or very close to release, I think
>>     it's time to restart our OMERO.features discussions.
>>
>>     We got as far as the idea of a 2D table with any number of
>>     key-value pairs on each column and row, so for example each row
>>     could be as simple as (OmeroType: Image, OmeroId: 123), or in the
>>     case of features which are a function of multiple images or
>>     channels (OmeroType: Image, OmeroId: 123, Channel1: 0, Channel2:
>>     3), etc. Columns could for example be (FeatureFamily: WndCharm,
>>     Feature: Zernike). Each table cell could either be a scalar or
>>     array. Retrieving features could be done by providing key-value
>>     pairs to be matched.
>>
>>     All of this is still up for discussion, especially since the
>>     implementation of this interface could be challenging and there's
>>     some redundancy/ambiguity. Just to be clear, the above is a
>>     conceptual description of how the API would appear to users, the
>>     actual back-end could be completely different.
>>
>>     Lee Kamentsky gave us a use case just before Christmas [1], Chris
>>     Coletta and Ivan Cao-berg are planning to summarise how they see
>>     WND-CHARM and OMERO.searcher fitting in. I know a few other
>>     people are interested in this discussion, so feel free to respond
>>     here or in the forums.
>>
>>
>> For us, it's important to link features to regions of interest, 
>> specifically segmentations of whole cells and cellular compartments. 
>> The other issues have to do with scalability and the efficiency of 
>> retrieving large data sets either by selecting a few features for a 
>> large number of images (e.g. up to on the order of 1,000,000 images 
>> and 1,000 entries per feature per image) or by selecting many or all 
>> features associated with a subset of the regions of interest.
>>
>> We are also interested in recording tracking data. What's needed here 
>> is the ability to record a link between the region of interest in one 
>> frame of a time-series stack with a region of interest in a later 
>> frame and you need the flexibility of a many-many relationship to 
>> represent cell division and potentially merging. I'm fairly confident 
>> that you could encode that sort of thing in a 2-D table which had 
>> columns referencing both ROIs.
>>
>> Finally, we try to capture enough information about the analysis to 
>> make it reproducible - things like the pipeline used for the 
>> analysis, the GIT hash of the software used to run the analysis and 
>> of each image analyzed. I think all of that is easily captured, 
>> though, in the tables and I doubt we need any explicit functionality 
>> devoted to that. It might be nice to be able to annotate the table 
>> itself with attributes in order to document the linking of the 
>> analysis results to the experimental protocol, but the linking could 
>> be documented using columns in an experiment-wide table.
>>
>>
>>     A few of us are planning to meet up at the OME Paris meeting- if
>>     you're interested drop me an email.
>>
>>     Thanks
>>
>>     Simon
>>
>>     [1]
>>     http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
>>
>>
>>     On 7 Nov 2013, at 14:20, Simon Li <s.p.li at dundee.ac.uk
>>     <mailto:s.p.li at dundee.ac.uk>> wrote:
>>
>>     > Some notes from our meeting yesterday:
>>     >
>>     http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
>>     >
>>     > Summary:
>>     > We're thinking of representing features as a 2D array, with
>>     metadata stored as key-value maps attached to the array, or
>>     individual columns or rows. These keys could describe things such
>>     as the feature name (column), sample metadata (row), algorithm
>>     parameters, calculation pipelines, etc.
>>     >
>>     > This should work as an OMERO API- in order to retrieve features
>>     you'd pass in a set of key-value pairs, for instance to specify
>>     which features you want and which images/ROIs etc, and OMERO
>>     would handle the logic and return the feature table(s) matching
>>     those parameters. Since everyone has different requirements the
>>     keys could be anything, however we're trying to define a small
>>     set of standard keys- any suggestions are very welcome.
>>     >
>>     > Outside of OMERO we still need a format for transporting
>>     features, so we're thinking some form of HDF5.
>>     >
>>     > Simon
>>     >
>>     >
>>     > The University of Dundee is a registered Scottish Charity, No:
>>     SC015096
>>     > _______________________________________________
>>     > ome-devel mailing list
>>     > ome-devel at lists.openmicroscopy.org.uk
>>     <mailto:ome-devel at lists.openmicroscopy.org.uk>
>>     > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>>
>>     The University of Dundee is a registered Scottish Charity, No:
>>     SC015096
>>     _______________________________________________
>>     ome-devel mailing list
>>     ome-devel at lists.openmicroscopy.org.uk
>>     <mailto:ome-devel at lists.openmicroscopy.org.uk>
>>     http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
> -- 
> Anatole Chessel, PhD
> Research associate
> University of Cambridge
> Tennis Court Road, Cambridge CB2 1PD
> tel: +44 (0)1223 334065
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20140509/1255e3bf/attachment-0001.html>