[ome-devel] OMERO.features API development
Anatole Chessel
ac744 at cam.ac.uk
Fri May 9 15:43:35 BST 2014
Hi all,
I have been following the discussion on OMERO.features as it does
fall quite quite close to issues we have and things we are doing as
well, even though I did not participate. I thought I would chime in to
say what we are up to in this domain; we'll be in Paris for the OME user
meeting likely with a poster and happy to discuss...
We are doing yeast genome wide HT/HC microscopy studies and up to
now we've used a SQL database to store cell-wise feature extracted from
images (as we are just one lab with coherent experiments -but a lot of
data- it sounded like a good idea). We are currently looking into Neo4J
as a more flexible solution, and working on a HTML5 front-end enabling
visualisation and mining to the wider audience of experimental
biologist. Although I haven't used it that deeply -yet-, I quite like
neo4j as it does keep some structure to the data while not being as
rigid as sql. And I guess we went the opposite way and stored all
relevant information in the SQL/Neo4J DB, with the link with OMERO being
provided by having the IDs of omero objects stored somewhere.
Again, happy to talk in Paris about those things, which are indeed a
common pain in the community...
Cheers
Anatole
On 08/05/2014 14:40, Simon Li wrote:
> Hi all
>
> I've started a Github repository for trying out some OMERO.features
> ideas based on what I mentioned in the last email:
> https://github.com/manics/omero-features
>
> There's not a great deal in there at the moment. It's just saving
> features into a local HDF5 file using Pytables, and example.py creates
> a table similar to that used by Pyslid (OMERO.searcher). timings.txt
> shows some rough run-times. Key-value row pairs are mapped to table
> columns, however this means each row has to have the same
> keys. There's no simple way to have a key-value map per column, for
> now I'm just storing multiple features in one column.
>
> This is easily convertible to OMERO.tables, columns could be labelled
> using OMERO annotations (in 5.1 there's a new MapAnnotation), though
> it effectively means each group of features is stored separately and
> thus would need to be queried separately. Alternatively an auxiliary
> table could be used to store the per-column key-value pairs, similar
> to how column descriptions are currently stored in OMERO.tables.
>
> A major limitation is that database joins between OMERO and a
> feature-table aren't practical. For example, if each feature row is
> labelled with an image ID, and you want to select a subset of rows
> using an OMERO query, you have to pass a list of image IDs to the
> Pytables query function which from my initial testing is very limited
> in the number of parameters it'll handle (I get a stack overflow if
> too many image IDs are passed).
>
> In practice this means you'd either need the feature table to contain
> any metadata necessary for selecting rows (e.g. dataset ID, experiment
> parameters, annotations) even if this means duplicating information
> held in OMERO, or split the query up (very inefficient). This is
> probably fine for people dealing with features in bulk where you might
> download all features for a screen for offline processing, not so good
> for real-time searching such as OMERO.searcher where you'd either need
> to store everything you need for pre-filtering search results in the
> table, or read all features and do the filtering afterwards.
>
> Probably OK as far as developing the API is concerned, but longer term
> it suggests we need some other storage mechanism. Some of you will
> remember Joaquin Correa from Paris last year. He's currently working
> on his own feature storage implementation at LBL, so potentially this
> is something we could look at for OMERO, and of course there are many
> other possibilities.
>
> Simon
>
>
>
> On 24 Apr 2014, at 12:57, Lee Kamentsky <leek at broadinstitute.org
> <mailto:leek at broadinstitute.org>> wrote:
>
>> Hi all,
>> Just chiming in, since we were mentioned...
>>
>> On Wed, Apr 23, 2014 at 5:10 PM, Simon Li <s.p.li at dundee.ac.uk
>> <mailto:s.p.li at dundee.ac.uk>> wrote:
>>
>> Hi all
>>
>> Now that OMERO 5.0 is out of the way, and OMERO.searcher and
>> WND-CHRM are either released or very close to release, I think
>> it's time to restart our OMERO.features discussions.
>>
>> We got as far as the idea of a 2D table with any number of
>> key-value pairs on each column and row, so for example each row
>> could be as simple as (OmeroType: Image, OmeroId: 123), or in the
>> case of features which are a function of multiple images or
>> channels (OmeroType: Image, OmeroId: 123, Channel1: 0, Channel2:
>> 3), etc. Columns could for example be (FeatureFamily: WndCharm,
>> Feature: Zernike). Each table cell could either be a scalar or
>> array. Retrieving features could be done by providing key-value
>> pairs to be matched.
>>
>> All of this is still up for discussion, especially since the
>> implementation of this interface could be challenging and there's
>> some redundancy/ambiguity. Just to be clear, the above is a
>> conceptual description of how the API would appear to users, the
>> actual back-end could be completely different.
>>
>> Lee Kamentsky gave us a use case just before Christmas [1], Chris
>> Coletta and Ivan Cao-berg are planning to summarise how they see
>> WND-CHARM and OMERO.searcher fitting in. I know a few other
>> people are interested in this discussion, so feel free to respond
>> here or in the forums.
>>
>>
>> For us, it's important to link features to regions of interest,
>> specifically segmentations of whole cells and cellular compartments.
>> The other issues have to do with scalability and the efficiency of
>> retrieving large data sets either by selecting a few features for a
>> large number of images (e.g. up to on the order of 1,000,000 images
>> and 1,000 entries per feature per image) or by selecting many or all
>> features associated with a subset of the regions of interest.
>>
>> We are also interested in recording tracking data. What's needed here
>> is the ability to record a link between the region of interest in one
>> frame of a time-series stack with a region of interest in a later
>> frame and you need the flexibility of a many-many relationship to
>> represent cell division and potentially merging. I'm fairly confident
>> that you could encode that sort of thing in a 2-D table which had
>> columns referencing both ROIs.
>>
>> Finally, we try to capture enough information about the analysis to
>> make it reproducible - things like the pipeline used for the
>> analysis, the GIT hash of the software used to run the analysis and
>> of each image analyzed. I think all of that is easily captured,
>> though, in the tables and I doubt we need any explicit functionality
>> devoted to that. It might be nice to be able to annotate the table
>> itself with attributes in order to document the linking of the
>> analysis results to the experimental protocol, but the linking could
>> be documented using columns in an experiment-wide table.
>>
>>
>> A few of us are planning to meet up at the OME Paris meeting- if
>> you're interested drop me an email.
>>
>> Thanks
>>
>> Simon
>>
>> [1]
>> http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
>>
>>
>> On 7 Nov 2013, at 14:20, Simon Li <s.p.li at dundee.ac.uk
>> <mailto:s.p.li at dundee.ac.uk>> wrote:
>>
>> > Some notes from our meeting yesterday:
>> >
>> http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
>> >
>> > Summary:
>> > We're thinking of representing features as a 2D array, with
>> metadata stored as key-value maps attached to the array, or
>> individual columns or rows. These keys could describe things such
>> as the feature name (column), sample metadata (row), algorithm
>> parameters, calculation pipelines, etc.
>> >
>> > This should work as an OMERO API- in order to retrieve features
>> you'd pass in a set of key-value pairs, for instance to specify
>> which features you want and which images/ROIs etc, and OMERO
>> would handle the logic and return the feature table(s) matching
>> those parameters. Since everyone has different requirements the
>> keys could be anything, however we're trying to define a small
>> set of standard keys- any suggestions are very welcome.
>> >
>> > Outside of OMERO we still need a format for transporting
>> features, so we're thinking some form of HDF5.
>> >
>> > Simon
>> >
>> >
>> > The University of Dundee is a registered Scottish Charity, No:
>> SC015096
>> > _______________________________________________
>> > ome-devel mailing list
>> > ome-devel at lists.openmicroscopy.org.uk
>> <mailto:ome-devel at lists.openmicroscopy.org.uk>
>> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>>
>> The University of Dundee is a registered Scottish Charity, No:
>> SC015096
>> _______________________________________________
>> ome-devel mailing list
>> ome-devel at lists.openmicroscopy.org.uk
>> <mailto:ome-devel at lists.openmicroscopy.org.uk>
>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
> --
> Anatole Chessel, PhD
> Research associate
> University of Cambridge
> Tennis Court Road, Cambridge CB2 1PD
> tel: +44 (0)1223 334065
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20140509/1255e3bf/attachment-0001.html>
More information about the ome-devel
mailing list