[ome-devel] OMERO.features API development

Lee Kamentsky leek at broadinstitute.org
Fri Nov 7 20:30:01 GMT 2014


Hi Simon, just a short comment - how does it scale with the number of ROIs
per image? It looks like the initial use case is one ROI / image - what
about 1000?

On Fri, Nov 7, 2014 at 12:18 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:

>  Hi all
>
>  It’s taken a bit longer than anticipated, but here’s a first
> implementation for the features API:
> https://github.com/ome/omero-features
>
>  It’s more or less what I’ve described in my last email, to start with I
> recommend looking at the README and example.py in the repository root to
> see how it all works. The key features are:
>
>    - Store and retrieve features using an Image-ID or Roi-ID.
>    - Each feature set consists of a number of named features and is
>    per-user-per-group to avoid permissions problems.
>    - The underlying storage format is OMERO.tables which has some
>    limitations, but one of the aims of this work is to figure out exactly what
>    we want to replace it with.
>
> As an example of using it I’ve created a new branch of OMERO.wndcharm:
> https://github.com/manics/omero-wndcharm/tree/omero-features
> Note only the original WND-CHARM image features are stored using the
> features API, other information such as classification labels, results, and
> feature weights can’t be stored at present. They would require additional
> columns to store row metadata, this is an obvious next step.
>
>  There’s obviously a lot more that could be handled by the API, but this
> is the sort of thing that really needs input from potential users so take a
> look and let me know what you think, either on this list, as a GitHub
> issue, or on the Wiki.
>
>  Cheers
>
>  Simon
>
>   From: Simon Li <s.p.li at dundee.ac.uk>
> Date: Wednesday, 27 August 2014 15:50
> To: Lee Kamentsky <leek at broadinstitute.org>, "Coletta, Christopher
> (NIH/NIA/IRP) [E]" <christopher.coletta at nih.gov>, "Ivan E. Cao-Berg" <
> icaoberg at andrew.cmu.edu>, Joaquin Correa <joaquincorrea at lbl.gov>
> Cc: OME Development <ome-devel at lists.openmicroscopy.org.uk>
> Subject: Re: [ome-devel] OMERO.features API development
>
>   Hi all
>
>  I had a discussion with Jason and Jean-Marie in Dundee a couple of weeks
> ago on how to make progress with the OMERO.features API. The problem of
> properly storing all the metadata we require is obviously extremely
> important, but it's going to take a while to figure out.
>
>  So, to begin with we came up with the idea of a simplified API that
> would work as a client-side Python library and should be relatively simple
> to implement. I've described it on the OMERO.features wiki:
> https://github.com/ome/omero-features/wiki/API-Outline-V1
>
>  The key points are:
>
>  * Store the Image and/or ROI ID (which allows us to specify
> planes/channels/tiles without needing to add columns to the table), and
> feature names/values, using OMERO.tables.
> * Provide methods to store a feature row, retrieve a feature row by
> Image/ROI ID, and to select rows by querying feature values (simple
> comparison operators), ideally with a simple to understand syntax.
> * Implement as a client-side Python library.
>
>  This means the more complicated requirements such as linking objects
> other than images/ROIs, more complicated feature metadata, and supporting
> both efficient bulk and random access will be left for the next iteration.
> As always comments are welcome- if you think this is too simple to be
> useful don't be afraid to say so.
>
>  Cheers
>
>  Simon
>
>
>   On 08/05/2014 15:20, "Lee Kamentsky" <leek at broadinstitute.org> wrote:
>
>   Hi all,
>
>
> On Thu, May 8, 2014 at 9:40 AM, Simon Li <s.p.li at dundee.ac.uk> wrote:
>
>>  Hi all
>>
>>  I've started a Github repository for trying out some OMERO.features
>> ideas based on what I mentioned in the last email:
>> https://github.com/manics/omero-features
>>
>>  There's not a great deal in there at the moment. It's just saving
>> features into a local HDF5 file using Pytables, and example.py creates a
>> table similar to that used by Pyslid (OMERO.searcher). timings.txt shows
>> some rough run-times. Key-value row pairs are mapped to table columns,
>> however this means each row has to have the same keys. There's no simple
>> way to have a key-value map per column, for now I'm just storing multiple
>> features in one column.
>>
>>  This is easily convertible to OMERO.tables, columns could be labelled
>> using OMERO annotations (in 5.1 there's a new MapAnnotation), though it
>> effectively means each group of features is stored separately and thus
>> would need to be queried separately. Alternatively an auxiliary table could
>> be used to store the per-column key-value pairs, similar to how column
>> descriptions are currently stored in OMERO.tables.
>>
>>  A major limitation is that database joins between OMERO and a
>> feature-table aren't practical. For example, if each feature row is
>> labelled with an image ID, and you want to select a subset of rows using an
>> OMERO query, you have to pass a list of image IDs to the Pytables query
>> function which from my initial testing is very limited in the number of
>> parameters it'll handle (I get a stack overflow if too many image IDs are
>> passed).
>>
>>  In practice this means you'd either need the feature table to contain
>> any metadata necessary for selecting rows (e.g. dataset ID, experiment
>> parameters, annotations) even if this means duplicating information held in
>> OMERO, or split the query up (very inefficient). This is probably fine for
>> people dealing with features in bulk where you might download all features
>> for a screen for offline processing, not so good for real-time searching
>> such as OMERO.searcher where you'd either need to store everything you need
>> for pre-filtering search results in the table, or read all features and do
>> the filtering afterwards.
>>
>>  Probably OK as far as developing the API is concerned, but longer term
>> it suggests we need some other storage mechanism. Some of you will remember
>> Joaquin Correa from Paris last year. He's currently working on his own
>> feature storage implementation at LBL, so potentially this is something we
>> could look at for OMERO, and of course there are many other possibilities.
>>
>>   People in other groups here (Broad Institute) are looking at MongoDB
> as an alternative to HDF5 - we are all sort of struggling with the same
> types of problems and I don't think anyone has found a solution. In
> CellProfiler, for HDF5, we maintain a dataset of per-image indexing
> information into the HDF5 datasets and perhaps that's an appropriate hybrid
> approach for OMERO - the join returns the slicing information needed for
> pulling the data out of the datasets and you then retrieve the data from
> HDF5. We store ~1K values per image per feature (one value per cell or
> other segmented object), so for us, each round-trip down the HDF5 stack
> deals with a reasonable amount of data. HDF5 slicing isn't as flexible as
> Numpy - you can ask for ranges, but not a list of individual dimension
> coordinates which would be what you'd want if you were returning data for a
> large number of rows. Fetching individual values from HDF5 is painfully
> slow for our scale of experiments, no really good solution, maybe all I
> have to contribute on the topic is "I feel your pain" :-(
>
>    Simon
>>
>>
>>
>>  On 24 Apr 2014, at 12:57, Lee Kamentsky <leek at broadinstitute.org> wrote:
>>
>>  Hi all,
>> Just chiming in, since we were mentioned...
>>
>> On Wed, Apr 23, 2014 at 5:10 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:
>>
>>> Hi all
>>>
>>> Now that OMERO 5.0 is out of the way, and OMERO.searcher and WND-CHRM
>>> are either released or very close to release, I think it's time to restart
>>> our OMERO.features discussions.
>>>
>>> We got as far as the idea of a 2D table with any number of key-value
>>> pairs on each column and row, so for example each row could be as simple as
>>> (OmeroType: Image, OmeroId: 123), or in the case of features which are a
>>> function of multiple images or channels (OmeroType: Image, OmeroId: 123,
>>> Channel1: 0, Channel2: 3), etc. Columns could for example be
>>> (FeatureFamily: WndCharm, Feature: Zernike). Each table cell could either
>>> be a scalar or array. Retrieving features could be done by providing
>>> key-value pairs to be matched.
>>>
>>> All of this is still up for discussion, especially since the
>>> implementation of this interface could be challenging and there's some
>>> redundancy/ambiguity. Just to be clear, the above is a conceptual
>>> description of how the API would appear to users, the actual back-end could
>>> be completely different.
>>>
>>> Lee Kamentsky gave us a use case just before Christmas [1], Chris
>>> Coletta and Ivan Cao-berg are planning to summarise how they see WND-CHARM
>>> and OMERO.searcher fitting in. I know a few other people are interested in
>>> this discussion, so feel free to respond here or in the forums.
>>>
>>
>>  For us, it's important to link features to regions of interest,
>> specifically segmentations of whole cells and cellular compartments. The
>> other issues have to do with scalability and the efficiency of retrieving
>> large data sets either by selecting a few features for a large number of
>> images (e.g. up to on the order of 1,000,000 images and 1,000 entries per
>> feature per image) or by selecting many or all features associated with a
>> subset of the regions of interest.
>>
>>  We are also interested in recording tracking data. What's needed here
>> is the ability to record a link between the region of interest in one frame
>> of a time-series stack with a region of interest in a later frame and you
>> need the flexibility of a many-many relationship to represent cell division
>> and potentially merging. I'm fairly confident that you could encode that
>> sort of thing in a 2-D table which had columns referencing both ROIs.
>>
>>  Finally, we try to capture enough information about the analysis to
>> make it reproducible - things like the pipeline used for the analysis, the
>> GIT hash of the software used to run the analysis and of each image
>> analyzed. I think all of that is easily captured, though, in the tables and
>> I doubt we need any explicit functionality devoted to that. It might be
>> nice to be able to annotate the table itself with attributes in order to
>> document the linking of the analysis results to the experimental protocol,
>> but the linking could be documented using columns in an experiment-wide
>> table.
>>
>>>
>>> A few of us are planning to meet up at the OME Paris meeting- if you're
>>> interested drop me an email.
>>>
>>> Thanks
>>>
>>> Simon
>>>
>>> [1]
>>> http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
>>>
>>>
>>> On 7 Nov 2013, at 14:20, Simon Li <s.p.li at dundee.ac.uk> wrote:
>>>
>>> > Some notes from our meeting yesterday:
>>> >
>>> http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
>>> >
>>> > Summary:
>>> > We're thinking of representing features as a 2D array, with metadata
>>> stored as key-value maps attached to the array, or individual columns or
>>> rows. These keys could describe things such as the feature name (column),
>>> sample metadata (row), algorithm parameters, calculation pipelines, etc.
>>> >
>>> > This should work as an OMERO API- in order to retrieve features you'd
>>> pass in a set of key-value pairs, for instance to specify which features
>>> you want and which images/ROIs etc, and OMERO would handle the logic and
>>> return the feature table(s) matching those parameters. Since everyone has
>>> different requirements the keys could be anything, however we're trying to
>>> define a small set of standard keys- any suggestions are very welcome.
>>> >
>>> > Outside of OMERO we still need a format for transporting features, so
>>> we're thinking some form of HDF5.
>>> >
>>> > Simon
>>> >
>>> >
>>> > The University of Dundee is a registered Scottish Charity, No: SC015096
>>> > _______________________________________________
>>> > ome-devel mailing list
>>> > ome-devel at lists.openmicroscopy.org.uk
>>> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>>
>>>
>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>> _______________________________________________
>>> ome-devel mailing list
>>> ome-devel at lists.openmicroscopy.org.uk
>>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>>
>>
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>
>> _______________________________________________
>> ome-devel mailing list
>> ome-devel at lists.openmicroscopy.org.uk
>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20141107/e1a4eb6c/attachment-0001.html>


More information about the ome-devel mailing list