[ome-devel] OMERO.features API development

Thu May 8 15:20:31 BST 2014

Hi all,

On Thu, May 8, 2014 at 9:40 AM, Simon Li <s.p.li at dundee.ac.uk> wrote:

>  Hi all
>
>  I've started a Github repository for trying out some OMERO.features
> ideas based on what I mentioned in the last email:
> https://github.com/manics/omero-features
>
>  There's not a great deal in there at the moment. It's just saving
> features into a local HDF5 file using Pytables, and example.py creates a
> table similar to that used by Pyslid (OMERO.searcher). timings.txt shows
> some rough run-times. Key-value row pairs are mapped to table columns,
> however this means each row has to have the same keys. There's no simple
> way to have a key-value map per column, for now I'm just storing multiple
> features in one column.
>
>  This is easily convertible to OMERO.tables, columns could be labelled
> using OMERO annotations (in 5.1 there's a new MapAnnotation), though it
> effectively means each group of features is stored separately and thus
> would need to be queried separately. Alternatively an auxiliary table could
> be used to store the per-column key-value pairs, similar to how column
> descriptions are currently stored in OMERO.tables.
>
>  A major limitation is that database joins between OMERO and a
> feature-table aren't practical. For example, if each feature row is
> labelled with an image ID, and you want to select a subset of rows using an
> OMERO query, you have to pass a list of image IDs to the Pytables query
> function which from my initial testing is very limited in the number of
> parameters it'll handle (I get a stack overflow if too many image IDs are
> passed).
>
>  In practice this means you'd either need the feature table to contain
> any metadata necessary for selecting rows (e.g. dataset ID, experiment
> parameters, annotations) even if this means duplicating information held in
> OMERO, or split the query up (very inefficient). This is probably fine for
> people dealing with features in bulk where you might download all features
> for a screen for offline processing, not so good for real-time searching
> such as OMERO.searcher where you'd either need to store everything you need
> for pre-filtering search results in the table, or read all features and do
> the filtering afterwards.
>
>  Probably OK as far as developing the API is concerned, but longer term
> it suggests we need some other storage mechanism. Some of you will remember
> Joaquin Correa from Paris last year. He's currently working on his own
> feature storage implementation at LBL, so potentially this is something we
> could look at for OMERO, and of course there are many other possibilities.
>
> People in other groups here (Broad Institute) are looking at MongoDB as an
alternative to HDF5 - we are all sort of struggling with the same types of
problems and I don't think anyone has found a solution. In CellProfiler,
for HDF5, we maintain a dataset of per-image indexing information into the
HDF5 datasets and perhaps that's an appropriate hybrid approach for OMERO -
the join returns the slicing information needed for pulling the data out of
the datasets and you then retrieve the data from HDF5. We store ~1K values
per image per feature (one value per cell or other segmented object), so
for us, each round-trip down the HDF5 stack deals with a reasonable amount
of data. HDF5 slicing isn't as flexible as Numpy - you can ask for ranges,
but not a list of individual dimension coordinates which would be what
you'd want if you were returning data for a large number of rows. Fetching
individual values from HDF5 is painfully slow for our scale of experiments,
no really good solution, maybe all I have to contribute on the topic is "I
feel your pain" :-(

 Simon
>
>
>
>  On 24 Apr 2014, at 12:57, Lee Kamentsky <leek at broadinstitute.org> wrote:
>
>  Hi all,
> Just chiming in, since we were mentioned...
>
> On Wed, Apr 23, 2014 at 5:10 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:
>
>> Hi all
>>
>> Now that OMERO 5.0 is out of the way, and OMERO.searcher and WND-CHRM are
>> either released or very close to release, I think it's time to restart our
>> OMERO.features discussions.
>>
>> We got as far as the idea of a 2D table with any number of key-value
>> pairs on each column and row, so for example each row could be as simple as
>> (OmeroType: Image, OmeroId: 123), or in the case of features which are a
>> function of multiple images or channels (OmeroType: Image, OmeroId: 123,
>> Channel1: 0, Channel2: 3), etc. Columns could for example be
>> (FeatureFamily: WndCharm, Feature: Zernike). Each table cell could either
>> be a scalar or array. Retrieving features could be done by providing
>> key-value pairs to be matched.
>>
>> All of this is still up for discussion, especially since the
>> implementation of this interface could be challenging and there's some
>> redundancy/ambiguity. Just to be clear, the above is a conceptual
>> description of how the API would appear to users, the actual back-end could
>> be completely different.
>>
>> Lee Kamentsky gave us a use case just before Christmas [1], Chris Coletta
>> and Ivan Cao-berg are planning to summarise how they see WND-CHARM and
>> OMERO.searcher fitting in. I know a few other people are interested in this
>> discussion, so feel free to respond here or in the forums.
>>
>
>  For us, it's important to link features to regions of interest,
> specifically segmentations of whole cells and cellular compartments. The
> other issues have to do with scalability and the efficiency of retrieving
> large data sets either by selecting a few features for a large number of
> images (e.g. up to on the order of 1,000,000 images and 1,000 entries per
> feature per image) or by selecting many or all features associated with a
> subset of the regions of interest.
>
>  We are also interested in recording tracking data. What's needed here is
> the ability to record a link between the region of interest in one frame of
> a time-series stack with a region of interest in a later frame and you need
> the flexibility of a many-many relationship to represent cell division and
> potentially merging. I'm fairly confident that you could encode that sort
> of thing in a 2-D table which had columns referencing both ROIs.
>
>  Finally, we try to capture enough information about the analysis to make
> it reproducible - things like the pipeline used for the analysis, the GIT
> hash of the software used to run the analysis and of each image analyzed. I
> think all of that is easily captured, though, in the tables and I doubt we
> need any explicit functionality devoted to that. It might be nice to be
> able to annotate the table itself with attributes in order to document the
> linking of the analysis results to the experimental protocol, but the
> linking could be documented using columns in an experiment-wide table.
>
>>
>> A few of us are planning to meet up at the OME Paris meeting- if you're
>> interested drop me an email.
>>
>> Thanks
>>
>> Simon
>>
>> [1]
>> http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
>>
>>
>> On 7 Nov 2013, at 14:20, Simon Li <s.p.li at dundee.ac.uk> wrote:
>>
>> > Some notes from our meeting yesterday:
>> >
>> http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
>> >
>> > Summary:
>> > We're thinking of representing features as a 2D array, with metadata
>> stored as key-value maps attached to the array, or individual columns or
>> rows. These keys could describe things such as the feature name (column),
>> sample metadata (row), algorithm parameters, calculation pipelines, etc.
>> >
>> > This should work as an OMERO API- in order to retrieve features you'd
>> pass in a set of key-value pairs, for instance to specify which features
>> you want and which images/ROIs etc, and OMERO would handle the logic and
>> return the feature table(s) matching those parameters. Since everyone has
>> different requirements the keys could be anything, however we're trying to
>> define a small set of standard keys- any suggestions are very welcome.
>> >
>> > Outside of OMERO we still need a format for transporting features, so
>> we're thinking some form of HDF5.
>> >
>> > Simon
>> >
>> >
>> > The University of Dundee is a registered Scottish Charity, No: SC015096
>> > _______________________________________________
>> > ome-devel mailing list
>> > ome-devel at lists.openmicroscopy.org.uk
>> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>> _______________________________________________
>> ome-devel mailing list
>> ome-devel at lists.openmicroscopy.org.uk
>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20140508/e514531d/attachment-0001.html>