[ome-devel] OMERO.features API development
Simon Li
s.p.li at dundee.ac.uk
Fri Nov 7 23:44:02 GMT 2014
Hi Lee
I haven't done any proper benchmarking, though I did attempt to create a
feature set for an image with one ROI per pixel, the bottle-neck was the
time taken to create the ROI in OMERO. When it comes to retrieval if
you've stored features for a ROI both the ROI ID and associated Image ID
should be present in the table so retrieval should be fast until you hit
the limits of PyTables/OMERO.tables.
Simon
On Fri, 2014-11-07 at 15:30 -0500, Lee Kamentsky wrote:
> Hi Simon, just a short comment - how does it scale with the number of
> ROIs per image? It looks like the initial use case is one ROI / image
> - what about 1000?
>
> On Fri, Nov 7, 2014 at 12:18 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:
> Hi all
>
>
> It’s taken a bit longer than anticipated, but here’s a first
> implementation for the features API:
> https://github.com/ome/omero-features
>
>
> It’s more or less what I’ve described in my last email, to
> start with I recommend looking at the README and example.py in
> the repository root to see how it all works. The key features
> are:
> * Store and retrieve features using an Image-ID or
> Roi-ID.
> * Each feature set consists of a number of named
> features and is per-user-per-group to avoid
> permissions problems.
> * The underlying storage format is OMERO.tables which
> has some limitations, but one of the aims of this work
> is to figure out exactly what we want to replace it
> with.
> As an example of using it I’ve created a new branch of
> OMERO.wndcharm:
> https://github.com/manics/omero-wndcharm/tree/omero-features
> Note only the original WND-CHARM image features are stored
> using the features API, other information such as
> classification labels, results, and feature weights can’t be
> stored at present. They would require additional columns to
> store row metadata, this is an obvious next step.
>
>
> There’s obviously a lot more that could be handled by the API,
> but this is the sort of thing that really needs input from
> potential users so take a look and let me know what you think,
> either on this list, as a GitHub issue, or on the Wiki.
>
>
> Cheers
>
>
> Simon
>
>
> From: Simon Li <s.p.li at dundee.ac.uk>
> Date: Wednesday, 27 August 2014 15:50
> To: Lee Kamentsky <leek at broadinstitute.org>, "Coletta,
> Christopher (NIH/NIA/IRP) [E]" <christopher.coletta at nih.gov>,
> "Ivan E. Cao-Berg" <icaoberg at andrew.cmu.edu>, Joaquin Correa
> <joaquincorrea at lbl.gov>
> Cc: OME Development <ome-devel at lists.openmicroscopy.org.uk>
> Subject: Re: [ome-devel] OMERO.features API development
>
>
>
> Hi all
>
>
> I had a discussion with Jason and Jean-Marie in Dundee a
> couple of weeks ago on how to make progress with the
> OMERO.features API. The problem of properly storing all the
> metadata we require is obviously extremely important, but it's
> going to take a while to figure out.
>
>
> So, to begin with we came up with the idea of a simplified API
> that would work as a client-side Python library and should be
> relatively simple to implement. I've described it on the
> OMERO.features wiki:
> https://github.com/ome/omero-features/wiki/API-Outline-V1
>
>
> The key points are:
>
>
> * Store the Image and/or ROI ID (which allows us to specify
> planes/channels/tiles without needing to add columns to the
> table), and feature names/values, using OMERO.tables.
> * Provide methods to store a feature row, retrieve a feature
> row by Image/ROI ID, and to select rows by querying feature
> values (simple comparison operators), ideally with a simple to
> understand syntax.
> * Implement as a client-side Python library.
>
>
> This means the more complicated requirements such as linking
> objects other than images/ROIs, more complicated feature
> metadata, and supporting both efficient bulk and random access
> will be left for the next iteration. As always comments are
> welcome- if you think this is too simple to be useful don't be
> afraid to say so.
>
>
> Cheers
>
>
> Simon
>
>
>
>
> On 08/05/2014 15:20, "Lee Kamentsky" <leek at broadinstitute.org>
> wrote:
>
>
> Hi all,
>
>
> On Thu, May 8, 2014 at 9:40 AM, Simon Li
> <s.p.li at dundee.ac.uk> wrote:
> Hi all
>
>
> I've started a Github repository for trying
> out some OMERO.features ideas based on what I
> mentioned in the last email:
> https://github.com/manics/omero-features
>
>
> There's not a great deal in there at the
> moment. It's just saving features into a local
> HDF5 file using Pytables, and example.py
> creates a table similar to that used by Pyslid
> (OMERO.searcher). timings.txt shows some rough
> run-times. Key-value row pairs are mapped to
> table columns, however this means each row has
> to have the same keys. There's no simple way
> to have a key-value map per column, for now
> I'm just storing multiple features in one
> column.
>
>
> This is easily convertible to OMERO.tables,
> columns could be labelled using OMERO
> annotations (in 5.1 there's a new
> MapAnnotation), though it effectively means
> each group of features is stored separately
> and thus would need to be queried separately.
> Alternatively an auxiliary table could be used
> to store the per-column key-value pairs,
> similar to how column descriptions are
> currently stored in OMERO.tables.
>
>
> A major limitation is that database joins
> between OMERO and a feature-table aren't
> practical. For example, if each feature row is
> labelled with an image ID, and you want to
> select a subset of rows using an OMERO query,
> you have to pass a list of image IDs to the
> Pytables query function which from my initial
> testing is very limited in the number of
> parameters it'll handle (I get a stack
> overflow if too many image IDs are passed).
>
>
> In practice this means you'd either need the
> feature table to contain any metadata
> necessary for selecting rows (e.g. dataset ID,
> experiment parameters, annotations) even if
> this means duplicating information held in
> OMERO, or split the query up (very
> inefficient). This is probably fine for people
> dealing with features in bulk where you might
> download all features for a screen for offline
> processing, not so good for real-time
> searching such as OMERO.searcher where you'd
> either need to store everything you need for
> pre-filtering search results in the table, or
> read all features and do the filtering
> afterwards.
>
>
> Probably OK as far as developing the API is
> concerned, but longer term it suggests we need
> some other storage mechanism. Some of you will
> remember Joaquin Correa from Paris last year.
> He's currently working on his own feature
> storage implementation at LBL, so potentially
> this is something we could look at for OMERO,
> and of course there are many other
> possibilities.
>
>
> People in other groups here (Broad Institute) are
> looking at MongoDB as an alternative to HDF5 - we are
> all sort of struggling with the same types of problems
> and I don't think anyone has found a solution. In
> CellProfiler, for HDF5, we maintain a dataset of
> per-image indexing information into the HDF5 datasets
> and perhaps that's an appropriate hybrid approach for
> OMERO - the join returns the slicing information
> needed for pulling the data out of the datasets and
> you then retrieve the data from HDF5. We store ~1K
> values per image per feature (one value per cell or
> other segmented object), so for us, each round-trip
> down the HDF5 stack deals with a reasonable amount of
> data. HDF5 slicing isn't as flexible as Numpy - you
> can ask for ranges, but not a list of individual
> dimension coordinates which would be what you'd want
> if you were returning data for a large number of rows.
> Fetching individual values from HDF5 is painfully slow
> for our scale of experiments, no really good solution,
> maybe all I have to contribute on the topic is "I feel
> your pain" :-(
>
>
>
> Simon
>
>
>
>
>
> On 24 Apr 2014, at 12:57, Lee Kamentsky
> <leek at broadinstitute.org> wrote:
>
> > Hi all,
> > Just chiming in, since we were mentioned...
> >
> > On Wed, Apr 23, 2014 at 5:10 PM, Simon Li
> > <s.p.li at dundee.ac.uk> wrote:
> > Hi all
> >
> > Now that OMERO 5.0 is out of the
> > way, and OMERO.searcher and WND-CHRM
> > are either released or very close to
> > release, I think it's time to
> > restart our OMERO.features
> > discussions.
> >
> > We got as far as the idea of a 2D
> > table with any number of key-value
> > pairs on each column and row, so for
> > example each row could be as simple
> > as (OmeroType: Image, OmeroId: 123),
> > or in the case of features which are
> > a function of multiple images or
> > channels (OmeroType: Image, OmeroId:
> > 123, Channel1: 0, Channel2: 3), etc.
> > Columns could for example be
> > (FeatureFamily: WndCharm, Feature:
> > Zernike). Each table cell could
> > either be a scalar or array.
> > Retrieving features could be done by
> > providing key-value pairs to be
> > matched.
> >
> > All of this is still up for
> > discussion, especially since the
> > implementation of this interface
> > could be challenging and there's
> > some redundancy/ambiguity. Just to
> > be clear, the above is a conceptual
> > description of how the API would
> > appear to users, the actual back-end
> > could be completely different.
> >
> > Lee Kamentsky gave us a use case
> > just before Christmas [1], Chris
> > Coletta and Ivan Cao-berg are
> > planning to summarise how they see
> > WND-CHARM and OMERO.searcher fitting
> > in. I know a few other people are
> > interested in this discussion, so
> > feel free to respond here or in the
> > forums.
> >
> >
> > For us, it's important to link features to
> > regions of interest, specifically
> > segmentations of whole cells and cellular
> > compartments. The other issues have to do
> > with scalability and the efficiency of
> > retrieving large data sets either by
> > selecting a few features for a large number
> > of images (e.g. up to on the order of
> > 1,000,000 images and 1,000 entries per
> > feature per image) or by selecting many or
> > all features associated with a subset of the
> > regions of interest.
> >
> >
> > We are also interested in recording tracking
> > data. What's needed here is the ability to
> > record a link between the region of interest
> > in one frame of a time-series stack with a
> > region of interest in a later frame and you
> > need the flexibility of a many-many
> > relationship to represent cell division and
> > potentially merging. I'm fairly confident
> > that you could encode that sort of thing in
> > a 2-D table which had columns referencing
> > both ROIs.
> >
> >
> > Finally, we try to capture enough
> > information about the analysis to make it
> > reproducible - things like the pipeline used
> > for the analysis, the GIT hash of the
> > software used to run the analysis and of
> > each image analyzed. I think all of that is
> > easily captured, though, in the tables and I
> > doubt we need any explicit functionality
> > devoted to that. It might be nice to be able
> > to annotate the table itself with attributes
> > in order to document the linking of the
> > analysis results to the experimental
> > protocol, but the linking could be
> > documented using columns in an
> > experiment-wide table.
> >
> > A few of us are planning to meet up
> > at the OME Paris meeting- if you're
> > interested drop me an email.
> >
> > Thanks
> >
> > Simon
> >
> > [1]
> > http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
> >
> >
> > On 7 Nov 2013, at 14:20, Simon Li
> > <s.p.li at dundee.ac.uk> wrote:
> >
> > > Some notes from our meeting
> > yesterday:
> > >
> > http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
> > >
> > > Summary:
> > > We're thinking of representing
> > features as a 2D array, with
> > metadata stored as key-value maps
> > attached to the array, or individual
> > columns or rows. These keys could
> > describe things such as the feature
> > name (column), sample metadata
> > (row), algorithm parameters,
> > calculation pipelines, etc.
> > >
> > > This should work as an OMERO API-
> > in order to retrieve features you'd
> > pass in a set of key-value pairs,
> > for instance to specify which
> > features you want and which
> > images/ROIs etc, and OMERO would
> > handle the logic and return the
> > feature table(s) matching those
> > parameters. Since everyone has
> > different requirements the keys
> > could be anything, however we're
> > trying to define a small set of
> > standard keys- any suggestions are
> > very welcome.
> > >
> > > Outside of OMERO we still need a
> > format for transporting features, so
> > we're thinking some form of HDF5.
> > >
> > > Simon
> > >
> > >
> > > The University of Dundee is a
> > registered Scottish Charity, No:
> > SC015096
> > >
> > _______________________________________________
> > > ome-devel mailing list
> > >
> > ome-devel at lists.openmicroscopy.org.uk
> > >
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> >
> > The University of Dundee is a
> > registered Scottish Charity, No:
> > SC015096
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> >
>
>
> The University of Dundee is a registered
> Scottish Charity, No: SC015096
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
>
>
> The University of Dundee is a registered Scottish Charity, No:
> SC015096
>
> The University of Dundee is a registered Scottish Charity, No:
> SC015096
>
>
The University of Dundee is a registered Scottish Charity, No: SC015096
More information about the ome-devel
mailing list