[ome-devel] OMERO.features API development

Fri Nov 7 23:44:02 GMT 2014

Hi Lee

I haven't done any proper benchmarking, though I did attempt to create a
feature set for an image with one ROI per pixel, the bottle-neck was the
time taken to create the ROI in OMERO. When it comes to retrieval if
you've stored features for a ROI both the ROI ID and associated Image ID
should be present in the table so retrieval should be fast until you hit
the limits of PyTables/OMERO.tables.

Simon

On Fri, 2014-11-07 at 15:30 -0500, Lee Kamentsky wrote:
> Hi Simon, just a short comment - how does it scale with the number of
> ROIs per image? It looks like the initial use case is one ROI / image
> - what about 1000?
>
> On Fri, Nov 7, 2014 at 12:18 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:
>         Hi all
>
>
>         It’s taken a bit longer than anticipated, but here’s a first
>         implementation for the features API:
>         https://github.com/ome/omero-features
>
>
>         It’s more or less what I’ve described in my last email, to
>         start with I recommend looking at the README and example.py in
>         the repository root to see how it all works. The key features
>         are:
>               * Store and retrieve features using an Image-ID or
>                 Roi-ID.
>               * Each feature set consists of a number of named
>                 features and is per-user-per-group to avoid
>                 permissions problems.
>               * The underlying storage format is OMERO.tables which
>                 has some limitations, but one of the aims of this work
>                 is to figure out exactly what we want to replace it
>                 with.
>         As an example of using it I’ve created a new branch of
>         OMERO.wndcharm:
>         https://github.com/manics/omero-wndcharm/tree/omero-features
>         Note only the original WND-CHARM image features are stored
>         using the features API, other information such as
>         classification labels, results, and feature weights can’t be
>         stored at present. They would require additional columns to
>         store row metadata, this is an obvious next step.
>
>
>         There’s obviously a lot more that could be handled by the API,
>         but this is the sort of thing that really needs input from
>         potential users so take a look and let me know what you think,
>         either on this list, as a GitHub issue, or on the Wiki.
>
>
>         Cheers
>
>
>         Simon
>
>
>         From: Simon Li <s.p.li at dundee.ac.uk>
>         Date: Wednesday, 27 August 2014 15:50
>         To: Lee Kamentsky <leek at broadinstitute.org>, "Coletta,
>         Christopher (NIH/NIA/IRP) [E]" <christopher.coletta at nih.gov>,
>         "Ivan E. Cao-Berg" <icaoberg at andrew.cmu.edu>, Joaquin Correa
>         <joaquincorrea at lbl.gov>
>         Cc: OME Development <ome-devel at lists.openmicroscopy.org.uk>
>         Subject: Re: [ome-devel] OMERO.features API development
>
>
>
>         Hi all
>
>
>         I had a discussion with Jason and Jean-Marie in Dundee a
>         couple of weeks ago on how to make progress with the
>         OMERO.features API. The problem of properly storing all the
>         metadata we require is obviously extremely important, but it's
>         going to take a while to figure out.
>
>
>         So, to begin with we came up with the idea of a simplified API
>         that would work as a client-side Python library and should be
>         relatively simple to implement. I've described it on the
>         OMERO.features wiki:
>         https://github.com/ome/omero-features/wiki/API-Outline-V1
>
>
>         The key points are:
>
>
>         * Store the Image and/or ROI ID (which allows us to specify
>         planes/channels/tiles without needing to add columns to the
>         table), and feature names/values, using OMERO.tables.
>         * Provide methods to store a feature row, retrieve a feature
>         row by Image/ROI ID, and to select rows by querying feature
>         values (simple comparison operators), ideally with a simple to
>         understand syntax.
>         * Implement as a client-side Python library.
>
>
>         This means the more complicated requirements such as linking
>         objects other than images/ROIs, more complicated feature
>         metadata, and supporting both efficient bulk and random access
>         will be left for the next iteration. As always comments are
>         welcome- if you think this is too simple to be useful don't be
>         afraid to say so.
>
>
>         Cheers
>
>
>         Simon
>
>
>
>
>         On 08/05/2014 15:20, "Lee Kamentsky" <leek at broadinstitute.org>
>         wrote:
>
>
>                 Hi all,
>
>
>                 On Thu, May 8, 2014 at 9:40 AM, Simon Li
>                 <s.p.li at dundee.ac.uk> wrote:
>                         Hi all
>
>
>                         I've started a Github repository for trying
>                         out some OMERO.features ideas based on what I
>                         mentioned in the last email:
>                         https://github.com/manics/omero-features
>
>
>                         There's not a great deal in there at the
>                         moment. It's just saving features into a local
>                         HDF5 file using Pytables, and example.py
>                         creates a table similar to that used by Pyslid
>                         (OMERO.searcher). timings.txt shows some rough
>                         run-times. Key-value row pairs are mapped to
>                         table columns, however this means each row has
>                         to have the same keys. There's no simple way
>                         to have a key-value map per column, for now
>                         I'm just storing multiple features in one
>                         column.
>
>
>                         This is easily convertible to OMERO.tables,
>                         columns could be labelled using OMERO
>                         annotations (in 5.1 there's a new
>                         MapAnnotation), though it effectively means
>                         each group of features is stored separately
>                         and thus would need to be queried separately.
>                         Alternatively an auxiliary table could be used
>                         to store the per-column key-value pairs,
>                         similar to how column descriptions are
>                         currently stored in OMERO.tables.
>
>
>                         A major limitation is that database joins
>                         between OMERO and a feature-table aren't
>                         practical. For example, if each feature row is
>                         labelled with an image ID, and you want to
>                         select a subset of rows using an OMERO query,
>                         you have to pass a list of image IDs to the
>                         Pytables query function which from my initial
>                         testing is very limited in the number of
>                         parameters it'll handle (I get a stack
>                         overflow if too many image IDs are passed).
>
>
>                         In practice this means you'd either need the
>                         feature table to contain any metadata
>                         necessary for selecting rows (e.g. dataset ID,
>                         experiment parameters, annotations) even if
>                         this means duplicating information held in
>                         OMERO, or split the query up (very
>                         inefficient). This is probably fine for people
>                         dealing with features in bulk where you might
>                         download all features for a screen for offline
>                         processing, not so good for real-time
>                         searching such as OMERO.searcher where you'd
>                         either need to store everything you need for
>                         pre-filtering search results in the table, or
>                         read all features and do the filtering
>                         afterwards.
>
>
>                         Probably OK as far as developing the API is
>                         concerned, but longer term it suggests we need
>                         some other storage mechanism. Some of you will
>                         remember Joaquin Correa from Paris last year.
>                         He's currently working on his own feature
>                         storage implementation at LBL, so potentially
>                         this is something we could look at for OMERO,
>                         and of course there are many other
>                         possibilities.
>
>
>                 People in other groups here (Broad Institute) are
>                 looking at MongoDB as an alternative to HDF5 - we are
>                 all sort of struggling with the same types of problems
>                 and I don't think anyone has found a solution. In
>                 CellProfiler, for HDF5, we maintain a dataset of
>                 per-image indexing information into the HDF5 datasets
>                 and perhaps that's an appropriate hybrid approach for
>                 OMERO - the join returns the slicing information
>                 needed for pulling the data out of the datasets and
>                 you then retrieve the data from HDF5. We store ~1K
>                 values per image per feature (one value per cell or
>                 other segmented object), so for us, each round-trip
>                 down the HDF5 stack deals with a reasonable amount of
>                 data. HDF5 slicing isn't as flexible as Numpy - you
>                 can ask for ranges, but not a list of individual
>                 dimension coordinates which would be what you'd want
>                 if you were returning data for a large number of rows.
>                 Fetching individual values from HDF5 is painfully slow
>                 for our scale of experiments, no really good solution,
>                 maybe all I have to contribute on the topic is "I feel
>                 your pain" :-(
>
>
>
>                         Simon
>
>
>
>
>
>                         On 24 Apr 2014, at 12:57, Lee Kamentsky
>                         <leek at broadinstitute.org> wrote:
>
>                         > Hi all,
>                         > Just chiming in, since we were mentioned...
>                         >
>                         > On Wed, Apr 23, 2014 at 5:10 PM, Simon Li
>                         > <s.p.li at dundee.ac.uk> wrote:
>                         >         Hi all
>                         >
>                         >         Now that OMERO 5.0 is out of the
>                         >         way, and OMERO.searcher and WND-CHRM
>                         >         are either released or very close to
>                         >         release, I think it's time to
>                         >         restart our OMERO.features
>                         >         discussions.
>                         >
>                         >         We got as far as the idea of a 2D
>                         >         table with any number of key-value
>                         >         pairs on each column and row, so for
>                         >         example each row could be as simple
>                         >         as (OmeroType: Image, OmeroId: 123),
>                         >         or in the case of features which are
>                         >         a function of multiple images or
>                         >         channels (OmeroType: Image, OmeroId:
>                         >         123, Channel1: 0, Channel2: 3), etc.
>                         >         Columns could for example be
>                         >         (FeatureFamily: WndCharm, Feature:
>                         >         Zernike). Each table cell could
>                         >         either be a scalar or array.
>                         >         Retrieving features could be done by
>                         >         providing key-value pairs to be
>                         >         matched.
>                         >
>                         >         All of this is still up for
>                         >         discussion, especially since the
>                         >         implementation of this interface
>                         >         could be challenging and there's
>                         >         some redundancy/ambiguity. Just to
>                         >         be clear, the above is a conceptual
>                         >         description of how the API would
>                         >         appear to users, the actual back-end
>                         >         could be completely different.
>                         >
>                         >         Lee Kamentsky gave us a use case
>                         >         just before Christmas [1], Chris
>                         >         Coletta and Ivan Cao-berg are
>                         >         planning to summarise how they see
>                         >         WND-CHARM and OMERO.searcher fitting
>                         >         in. I know a few other people are
>                         >         interested in this discussion, so
>                         >         feel free to respond here or in the
>                         >         forums.
>                         >
>                         >
>                         > For us, it's important to link features to
>                         > regions of interest, specifically
>                         > segmentations of whole cells and cellular
>                         > compartments. The other issues have to do
>                         > with scalability and the efficiency of
>                         > retrieving large data sets either by
>                         > selecting a few features for a large number
>                         > of images (e.g. up to on the order of
>                         > 1,000,000 images and 1,000 entries per
>                         > feature per image) or by selecting many or
>                         > all features associated with a subset of the
>                         > regions of interest.
>                         >
>                         >
>                         > We are also interested in recording tracking
>                         > data. What's needed here is the ability to
>                         > record a link between the region of interest
>                         > in one frame of a time-series stack with a
>                         > region of interest in a later frame and you
>                         > need the flexibility of a many-many
>                         > relationship to represent cell division and
>                         > potentially merging. I'm fairly confident
>                         > that you could encode that sort of thing in
>                         > a 2-D table which had columns referencing
>                         > both ROIs.
>                         >
>                         >
>                         > Finally, we try to capture enough
>                         > information about the analysis to make it
>                         > reproducible - things like the pipeline used
>                         > for the analysis, the GIT hash of the
>                         > software used to run the analysis and of
>                         > each image analyzed. I think all of that is
>                         > easily captured, though, in the tables and I
>                         > doubt we need any explicit functionality
>                         > devoted to that. It might be nice to be able
>                         > to annotate the table itself with attributes
>                         > in order to document the linking of the
>                         > analysis results to the experimental
>                         > protocol, but the linking could be
>                         > documented using columns in an
>                         > experiment-wide table.
>                         >
>                         >         A few of us are planning to meet up
>                         >         at the OME Paris meeting- if you're
>                         >         interested drop me an email.
>                         >
>                         >         Thanks
>                         >
>                         >         Simon
>                         >
>                         >         [1]
>                         >         http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
>                         >
>                         >
>                         >         On 7 Nov 2013, at 14:20, Simon Li
>                         >         <s.p.li at dundee.ac.uk> wrote:
>                         >
>                         >         > Some notes from our meeting
>                         >         yesterday:
>                         >         >
>                         >         http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
>                         >         >
>                         >         > Summary:
>                         >         > We're thinking of representing
>                         >         features as a 2D array, with
>                         >         metadata stored as key-value maps
>                         >         attached to the array, or individual
>                         >         columns or rows. These keys could
>                         >         describe things such as the feature
>                         >         name (column), sample metadata
>                         >         (row), algorithm parameters,
>                         >         calculation pipelines, etc.
>                         >         >
>                         >         > This should work as an OMERO API-
>                         >         in order to retrieve features you'd
>                         >         pass in a set of key-value pairs,
>                         >         for instance to specify which
>                         >         features you want and which
>                         >         images/ROIs etc, and OMERO would
>                         >         handle the logic and return the
>                         >         feature table(s) matching those
>                         >         parameters. Since everyone has
>                         >         different requirements the keys
>                         >         could be anything, however we're
>                         >         trying to define a small set of
>                         >         standard keys- any suggestions are
>                         >         very welcome.
>                         >         >
>                         >         > Outside of OMERO we still need a
>                         >         format for transporting features, so
>                         >         we're thinking some form of HDF5.
>                         >         >
>                         >         > Simon
>                         >         >
>                         >         >
>                         >         > The University of Dundee is a
>                         >         registered Scottish Charity, No:
>                         >         SC015096
>                         >         >
>                         >         _______________________________________________
>                         >         > ome-devel mailing list
>                         >         >
>                         >         ome-devel at lists.openmicroscopy.org.uk
>                         >         >
>                         >         http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>                         >
>                         >
>                         >         The University of Dundee is a
>                         >         registered Scottish Charity, No:
>                         >         SC015096
>                         >         _______________________________________________
>                         >         ome-devel mailing list
>                         >         ome-devel at lists.openmicroscopy.org.uk
>                         >         http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>                         >
>                         >
>
>
>                         The University of Dundee is a registered
>                         Scottish Charity, No: SC015096
>
>                         _______________________________________________
>                         ome-devel mailing list
>                         ome-devel at lists.openmicroscopy.org.uk
>                         http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
>
>
>         The University of Dundee is a registered Scottish Charity, No:
>         SC015096
>
>         The University of Dundee is a registered Scottish Charity, No:
>         SC015096
>
>

The University of Dundee is a registered Scottish Charity, No: SC015096