[ome-devel] OMERO.features: Development of a new API for storing image features
Lee Kamentsky
leek at broadinstitute.org
Fri Jul 5 16:57:58 BST 2013
Hi all,
I think it's great that Bob Murphy's group has implemented pyslic and
pyslid in an open-source framework like OMERO. It looks like a substantial
body of work. I'm wondering what needs to be done to make it a
general-purpose framework however, especially looking at it from the
perspective of our group's experience with CellProfiler. Also, Simon,
thanks for moving this forward.
My reading of the pyslic code is that it supports a nuclear stain and a
protein stain and calculates a standard set of per-image and per-object
features (although I haven't quite figured out the storage mechanism for
the object features). This is adequate for a large class of experiments
involving two-color fluorescently-labeled samples and it's likely the
methods are robust, but our experience has been that experimental protocols
can be more varied (multiple protein stains, brightfield images) and the
biological questions can require additional image preprocessing to
highlight the structures of interest, often requiring tuning parameters
specific to the structure scale. Because of this, I think that the
framework needs a modular architecture that supports development of new
algorithms by computational researchers and configuration by the end users
and it needs to extend beyond a curated code-base to allow for innovation.
Personally, I'm really pleased that the framework is in Python because it
aligns well with our group, but perhaps this is limiting for the ImageJ
community and perhaps some portion of CellProfiler's bridge between Python
and ImageJ could be adapted to supply the connection.
I think that we do need a platform for innovation and the keys to that are
interoperability, standards, and a model of the analysis that is flexible
enough to describe our community's experiments and that captures the
analysis protocol in a reproducible manner. I'm going to outline my
perspective on the model here, drawing on our group's experience with
CellProfiler, and try to keep it brief. I see the components of the model
being:
* Fields of view - N dimensional spaces (X, Y, T, Z, spectral) representing
an imaging site
* Images - acquired image data on a field of view (with acquisition
metadata) or similar produced by algorithms such as filters or
morphological operations.
* Segmentations - defining multiple regions of interest on the fields of
view or on (hyper)planes of the fields of view
* Relationships between segmented regions - links between segmented regions
either within segmentations or across them. Examples might be time-lapse
cell tracking, associations between nuclear and cellular segmentations or
groupings of organelle segmentations within a cell.
* Measurements - data computed on the images, segmentations and
relationships within a field of view. My take on this is that a measurement
produces a numeric feature value per image or per segmentation region, but
perhaps that's too narrow.
* Protocol - a description of how to perform the analysis. I think the key
elements are a link to the OMERO screen and a list of the parameterized
algorithms to be performed. The screen provides image inputs to the
algorithms which are the available image acquisition channels and the
algorithms themselves provide images, segmentations, relationships and
measurements which can serve as inputs to other algorithms in the protocol.
Algorithms will often be parameterizable by the user and these parameters
should be captured by the protocol. Ideally, the protocol should capture
the versions of the algorithms using a mechanism such as a GIT hash. In
CellProfiler, we have algorithms that produce an aggregated image based on
samples from many fields of view, for instance an estimate of differences
in signal magnitude across the field of view caused by non-uniform
illumination - algorithms might have stacks of images as inputs and these
stacks might span individual fields of view.
As far as the actual mechanics, I see OMERO or similar using the protocol
as a dependency graph, fetching the algorithms using some
community-standard mechanism (maven? pip?), providing inputs as specified
by the protocol and harvesting the outputs for the database and for
dependent algorithms. I have some detailed concerns about algorithm
input/output introspection and discovery, but ImageJ 2.0's plugin
introspection protocol (@parameter) is a good starting point (thanks ImageJ
2.0).
OK - somewhat CellProfiler-centric perhaps, but the nice thing about OMERO
is that it is a relational database and the protocol is the thing itself -
not a description of the experiment, but a mineable map of how each number
is produced especially if the protocol pieces are described relationally in
the database. I think the above is an ambitious undertaking, but look at
the result! Researchers can trade protocols which produce robust and
comparable values (not just "nuclear area", but the nuclear area after
illumination correction and segmentation using Otsu thresholding and a
seeded watershed of HeLa cells stained with DAPI). Developers can publish
their method in OMERO and possibly OMERO itself can generate citations
based on a protocol, leading to better accreditation of our work. And OMERO
itself becomes a sustainable platform for analysis with a well-defined
interoperable API for image processing.
Hope this all gives things a positive lift, thx for reading this far,
--Lee
On Fri, Jul 5, 2013 at 10:03 AM, Simon Li <s.p.li at dundee.ac.uk> wrote:
> Hi everyone
>
> It was great to see so many people interested in OMERO.searcher and
> WND-CHRM at the Paris meeting, both those who were interested in installing
> it on their own systems and also those of you who were interested in
> developing other analysis algorithms for use with OMERO.
>
> One of the main points that came up was that OMERO should provide a single
> API for storing and calculating image features. Robert Murphy's group at
> CMU have already developed PySLID [http://github.com/icaoberg/pyslid], a
> python module for calculating and storing features used with
> OMERO.searcher, so I'd like to propose we bring this into the
> openmicroscopy GitHub organisation, and rename it to OMERO.features (other
> suggestions are welcome).
>
Then there's the much bigger task of modifying the module to cater for
> everyone's requirements. I can see several potential issues, including how
> we handle multiple channels, z-slices, timepoints, ROIs, etc since features
> can be calculated for these individually or as a whole.
>
> If anyone has any thoughts or comments on what they'd like to see it'd be
> great if you could share them with the rest of this list, or if you prefer
> on our forums.
>
> Best wishes
>
> Simon
>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20130705/bf89793b/attachment.html>
More information about the ome-devel
mailing list