[ome-devel] OMERO.features API development
Lee Kamentsky
leek at broadinstitute.org
Mon Nov 10 12:24:49 GMT 2014
On Fri, Nov 7, 2014 at 6:44 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:
> Hi Lee
>
> I haven't done any proper benchmarking, though I did attempt to create a
> feature set for an image with one ROI per pixel, the bottle-neck was the
> time taken to create the ROI in OMERO. When it comes to retrieval if
> you've stored features for a ROI both the ROI ID and associated Image ID
> should be present in the table so retrieval should be fast until you hit
> the limits of PyTables/OMERO.tables.
>
Excellent, Simon, that's the sort of thing I was hoping to hear. A formal
benchmark is always nice to have, but is just one more thing to pile onto a
to-do list, but it's reassuring that you tried something at the limits of
outrageous and witnessed that its effects were what you expected.
>
> Simon
>
> On Fri, 2014-11-07 at 15:30 -0500, Lee Kamentsky wrote:
> > Hi Simon, just a short comment - how does it scale with the number of
> > ROIs per image? It looks like the initial use case is one ROI / image
> > - what about 1000?
> >
> > On Fri, Nov 7, 2014 at 12:18 PM, Simon Li <s.p.li at dundee.ac.uk> wrote:
> > Hi all
> >
> >
> > It’s taken a bit longer than anticipated, but here’s a first
> > implementation for the features API:
> > https://github.com/ome/omero-features
> >
> >
> > It’s more or less what I’ve described in my last email, to
> > start with I recommend looking at the README and example.py in
> > the repository root to see how it all works. The key features
> > are:
> > * Store and retrieve features using an Image-ID or
> > Roi-ID.
> > * Each feature set consists of a number of named
> > features and is per-user-per-group to avoid
> > permissions problems.
> > * The underlying storage format is OMERO.tables which
> > has some limitations, but one of the aims of this work
> > is to figure out exactly what we want to replace it
> > with.
> > As an example of using it I’ve created a new branch of
> > OMERO.wndcharm:
> > https://github.com/manics/omero-wndcharm/tree/omero-features
> > Note only the original WND-CHARM image features are stored
> > using the features API, other information such as
> > classification labels, results, and feature weights can’t be
> > stored at present. They would require additional columns to
> > store row metadata, this is an obvious next step.
> >
> >
> > There’s obviously a lot more that could be handled by the API,
> > but this is the sort of thing that really needs input from
> > potential users so take a look and let me know what you think,
> > either on this list, as a GitHub issue, or on the Wiki.
> >
> >
> > Cheers
> >
> >
> > Simon
> >
> >
> > From: Simon Li <s.p.li at dundee.ac.uk>
> > Date: Wednesday, 27 August 2014 15:50
> > To: Lee Kamentsky <leek at broadinstitute.org>, "Coletta,
> > Christopher (NIH/NIA/IRP) [E]" <christopher.coletta at nih.gov>,
> > "Ivan E. Cao-Berg" <icaoberg at andrew.cmu.edu>, Joaquin Correa
> > <joaquincorrea at lbl.gov>
> > Cc: OME Development <ome-devel at lists.openmicroscopy.org.uk>
> > Subject: Re: [ome-devel] OMERO.features API development
> >
> >
> >
> > Hi all
> >
> >
> > I had a discussion with Jason and Jean-Marie in Dundee a
> > couple of weeks ago on how to make progress with the
> > OMERO.features API. The problem of properly storing all the
> > metadata we require is obviously extremely important, but it's
> > going to take a while to figure out.
> >
> >
> > So, to begin with we came up with the idea of a simplified API
> > that would work as a client-side Python library and should be
> > relatively simple to implement. I've described it on the
> > OMERO.features wiki:
> > https://github.com/ome/omero-features/wiki/API-Outline-V1
> >
> >
> > The key points are:
> >
> >
> > * Store the Image and/or ROI ID (which allows us to specify
> > planes/channels/tiles without needing to add columns to the
> > table), and feature names/values, using OMERO.tables.
> > * Provide methods to store a feature row, retrieve a feature
> > row by Image/ROI ID, and to select rows by querying feature
> > values (simple comparison operators), ideally with a simple to
> > understand syntax.
> > * Implement as a client-side Python library.
> >
> >
> > This means the more complicated requirements such as linking
> > objects other than images/ROIs, more complicated feature
> > metadata, and supporting both efficient bulk and random access
> > will be left for the next iteration. As always comments are
> > welcome- if you think this is too simple to be useful don't be
> > afraid to say so.
> >
> >
> > Cheers
> >
> >
> > Simon
> >
> >
> >
> >
> > On 08/05/2014 15:20, "Lee Kamentsky" <leek at broadinstitute.org>
> > wrote:
> >
> >
> > Hi all,
> >
> >
> > On Thu, May 8, 2014 at 9:40 AM, Simon Li
> > <s.p.li at dundee.ac.uk> wrote:
> > Hi all
> >
> >
> > I've started a Github repository for trying
> > out some OMERO.features ideas based on what I
> > mentioned in the last email:
> > https://github.com/manics/omero-features
> >
> >
> > There's not a great deal in there at the
> > moment. It's just saving features into a local
> > HDF5 file using Pytables, and example.py
> > creates a table similar to that used by Pyslid
> > (OMERO.searcher). timings.txt shows some rough
> > run-times. Key-value row pairs are mapped to
> > table columns, however this means each row has
> > to have the same keys. There's no simple way
> > to have a key-value map per column, for now
> > I'm just storing multiple features in one
> > column.
> >
> >
> > This is easily convertible to OMERO.tables,
> > columns could be labelled using OMERO
> > annotations (in 5.1 there's a new
> > MapAnnotation), though it effectively means
> > each group of features is stored separately
> > and thus would need to be queried separately.
> > Alternatively an auxiliary table could be used
> > to store the per-column key-value pairs,
> > similar to how column descriptions are
> > currently stored in OMERO.tables.
> >
> >
> > A major limitation is that database joins
> > between OMERO and a feature-table aren't
> > practical. For example, if each feature row is
> > labelled with an image ID, and you want to
> > select a subset of rows using an OMERO query,
> > you have to pass a list of image IDs to the
> > Pytables query function which from my initial
> > testing is very limited in the number of
> > parameters it'll handle (I get a stack
> > overflow if too many image IDs are passed).
> >
> >
> > In practice this means you'd either need the
> > feature table to contain any metadata
> > necessary for selecting rows (e.g. dataset ID,
> > experiment parameters, annotations) even if
> > this means duplicating information held in
> > OMERO, or split the query up (very
> > inefficient). This is probably fine for people
> > dealing with features in bulk where you might
> > download all features for a screen for offline
> > processing, not so good for real-time
> > searching such as OMERO.searcher where you'd
> > either need to store everything you need for
> > pre-filtering search results in the table, or
> > read all features and do the filtering
> > afterwards.
> >
> >
> > Probably OK as far as developing the API is
> > concerned, but longer term it suggests we need
> > some other storage mechanism. Some of you will
> > remember Joaquin Correa from Paris last year.
> > He's currently working on his own feature
> > storage implementation at LBL, so potentially
> > this is something we could look at for OMERO,
> > and of course there are many other
> > possibilities.
> >
> >
> > People in other groups here (Broad Institute) are
> > looking at MongoDB as an alternative to HDF5 - we are
> > all sort of struggling with the same types of problems
> > and I don't think anyone has found a solution. In
> > CellProfiler, for HDF5, we maintain a dataset of
> > per-image indexing information into the HDF5 datasets
> > and perhaps that's an appropriate hybrid approach for
> > OMERO - the join returns the slicing information
> > needed for pulling the data out of the datasets and
> > you then retrieve the data from HDF5. We store ~1K
> > values per image per feature (one value per cell or
> > other segmented object), so for us, each round-trip
> > down the HDF5 stack deals with a reasonable amount of
> > data. HDF5 slicing isn't as flexible as Numpy - you
> > can ask for ranges, but not a list of individual
> > dimension coordinates which would be what you'd want
> > if you were returning data for a large number of rows.
> > Fetching individual values from HDF5 is painfully slow
> > for our scale of experiments, no really good solution,
> > maybe all I have to contribute on the topic is "I feel
> > your pain" :-(
> >
> >
> >
> > Simon
> >
> >
> >
> >
> >
> > On 24 Apr 2014, at 12:57, Lee Kamentsky
> > <leek at broadinstitute.org> wrote:
> >
> > > Hi all,
> > > Just chiming in, since we were mentioned...
> > >
> > > On Wed, Apr 23, 2014 at 5:10 PM, Simon Li
> > > <s.p.li at dundee.ac.uk> wrote:
> > > Hi all
> > >
> > > Now that OMERO 5.0 is out of the
> > > way, and OMERO.searcher and WND-CHRM
> > > are either released or very close to
> > > release, I think it's time to
> > > restart our OMERO.features
> > > discussions.
> > >
> > > We got as far as the idea of a 2D
> > > table with any number of key-value
> > > pairs on each column and row, so for
> > > example each row could be as simple
> > > as (OmeroType: Image, OmeroId: 123),
> > > or in the case of features which are
> > > a function of multiple images or
> > > channels (OmeroType: Image, OmeroId:
> > > 123, Channel1: 0, Channel2: 3), etc.
> > > Columns could for example be
> > > (FeatureFamily: WndCharm, Feature:
> > > Zernike). Each table cell could
> > > either be a scalar or array.
> > > Retrieving features could be done by
> > > providing key-value pairs to be
> > > matched.
> > >
> > > All of this is still up for
> > > discussion, especially since the
> > > implementation of this interface
> > > could be challenging and there's
> > > some redundancy/ambiguity. Just to
> > > be clear, the above is a conceptual
> > > description of how the API would
> > > appear to users, the actual back-end
> > > could be completely different.
> > >
> > > Lee Kamentsky gave us a use case
> > > just before Christmas [1], Chris
> > > Coletta and Ivan Cao-berg are
> > > planning to summarise how they see
> > > WND-CHARM and OMERO.searcher fitting
> > > in. I know a few other people are
> > > interested in this discussion, so
> > > feel free to respond here or in the
> > > forums.
> > >
> > >
> > > For us, it's important to link features to
> > > regions of interest, specifically
> > > segmentations of whole cells and cellular
> > > compartments. The other issues have to do
> > > with scalability and the efficiency of
> > > retrieving large data sets either by
> > > selecting a few features for a large number
> > > of images (e.g. up to on the order of
> > > 1,000,000 images and 1,000 entries per
> > > feature per image) or by selecting many or
> > > all features associated with a subset of the
> > > regions of interest.
> > >
> > >
> > > We are also interested in recording tracking
> > > data. What's needed here is the ability to
> > > record a link between the region of interest
> > > in one frame of a time-series stack with a
> > > region of interest in a later frame and you
> > > need the flexibility of a many-many
> > > relationship to represent cell division and
> > > potentially merging. I'm fairly confident
> > > that you could encode that sort of thing in
> > > a 2-D table which had columns referencing
> > > both ROIs.
> > >
> > >
> > > Finally, we try to capture enough
> > > information about the analysis to make it
> > > reproducible - things like the pipeline used
> > > for the analysis, the GIT hash of the
> > > software used to run the analysis and of
> > > each image analyzed. I think all of that is
> > > easily captured, though, in the tables and I
> > > doubt we need any explicit functionality
> > > devoted to that. It might be nice to be able
> > > to annotate the table itself with attributes
> > > in order to document the linking of the
> > > analysis results to the experimental
> > > protocol, but the linking could be
> > > documented using columns in an
> > > experiment-wide table.
> > >
> > > A few of us are planning to meet up
> > > at the OME Paris meeting- if you're
> > > interested drop me an email.
> > >
> > > Thanks
> > >
> > > Simon
> > >
> > > [1]
> > >
> http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html
> > >
> > >
> > > On 7 Nov 2013, at 14:20, Simon Li
> > > <s.p.li at dundee.ac.uk> wrote:
> > >
> > > > Some notes from our meeting
> > > yesterday:
> > > >
> > >
> http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout
> > > >
> > > > Summary:
> > > > We're thinking of representing
> > > features as a 2D array, with
> > > metadata stored as key-value maps
> > > attached to the array, or individual
> > > columns or rows. These keys could
> > > describe things such as the feature
> > > name (column), sample metadata
> > > (row), algorithm parameters,
> > > calculation pipelines, etc.
> > > >
> > > > This should work as an OMERO API-
> > > in order to retrieve features you'd
> > > pass in a set of key-value pairs,
> > > for instance to specify which
> > > features you want and which
> > > images/ROIs etc, and OMERO would
> > > handle the logic and return the
> > > feature table(s) matching those
> > > parameters. Since everyone has
> > > different requirements the keys
> > > could be anything, however we're
> > > trying to define a small set of
> > > standard keys- any suggestions are
> > > very welcome.
> > > >
> > > > Outside of OMERO we still need a
> > > format for transporting features, so
> > > we're thinking some form of HDF5.
> > > >
> > > > Simon
> > > >
> > > >
> > > > The University of Dundee is a
> > > registered Scottish Charity, No:
> > > SC015096
> > > >
> > >
> _______________________________________________
> > > > ome-devel mailing list
> > > >
> > > ome-devel at lists.openmicroscopy.org.uk
> > > >
> > >
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> > >
> > >
> > > The University of Dundee is a
> > > registered Scottish Charity, No:
> > > SC015096
> > >
> _______________________________________________
> > > ome-devel mailing list
> > > ome-devel at lists.openmicroscopy.org.uk
> > >
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> > >
> > >
> >
> >
> > The University of Dundee is a registered
> > Scottish Charity, No: SC015096
> >
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> >
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> >
> >
> >
> > The University of Dundee is a registered Scottish Charity, No:
> > SC015096
> >
> > The University of Dundee is a registered Scottish Charity, No:
> > SC015096
> >
> >
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20141110/5526695d/attachment-0001.html>
More information about the ome-devel
mailing list