[ome-devel] Versioning objects e.g. features in OMERO
Lee Kamentsky
leek at broadinstitute.org
Wed Jan 22 15:16:26 GMT 2014
Thanks, Simon.
Our general approach is to assume that CellProfiler measurements are only
reproducible using the same analysis which includes the same software
revision, pipeline and image data. Using a classifier across experiments
often introduces an unacceptable level of variability anyway (biological
protocol, robotics, microscope setup, etc), so we usually apply the same
statistical methods using the new data. In cases where we do try to compare
across experimental batches, we always (I think) use the same software
revision on both batches and will rerun the analysis on old data if we need
to change software versions.
More discussion below...
On Wed, Jan 22, 2014 at 9:41 AM, Simon Li <s.p.li at dundee.ac.uk> wrote:
> Hi all
>
> I've been working with Chris Coletta on getting the latest version of
> WND-CHARM to work with OMERO, and I've stumbled into the problem of how to
> version objects such as calculated features in OMERO. I thought this might
> be relevant for others on the list, and also ties in with the ongoing
> OMERO.features discussion.
> Example use cases:
>
> 1. I've calculated and stored some features with v1.5 of WND-CHARM. Later
> I upgrade WND-CHARM to v2.0. How do I know that my v1.5 features may need
> to be either converted or re-calculated to v2.0.
We always recalculate. Rerunning an analysis isn't very expensive for us.
Conversion generally doesn't work - the effects of the version change can't
be reduced to a mapping of old to new values.
> Similarly for a trained classifier.
>
Often the ground truth / biologist annotations can be reapplied to the new
data to train new classifiers. For WND-CHARM, I'm guessing the ground truth
is positive and negative controls and it should be very easy to retrain.
>
> 2. I've upgraded WND-CHARM. Can I run a script which finds all WND-CHARM
> 1.5 objects and converts them into v2.0 format?
>
> I'd hope that WND-CHARM has annotated each measurement column with the
version. It should be possible to iterate over the columns and determine
which measurements require update. For both CellProfiler and WND-CHARM, the
cost of updating a single measurement might
be of the same order of magnitude or identical to rerunning the entire
analysis and in general, either all or none of the measurements will
require update.
> We might also want to think about versioning the same object from a user's
> perspective, e.g. if they redo their analysis with different parameters and
> save it into OMERO they might want to keep their previous versions.
>
Yup yup - with CP, I'd probably do a two-level annotation and give each
measurement an experiment ID and store both the software version and
parameterization as measurements indexed by experiment ID. That would give
you only a single value to check.
>
> A few initial ideas:
> * Namespace. Annotations in OMERO have a namespace (string) field, e.g. If
> the WND-CHARM results are stored in a file attached to an OMERO object
> this will be in the form of a FileAnnotation, so the version could be
> incorporated, e.g.:
>
> namespace=/Classifer/WndCharm/1.5
>
I like this. Two cases - informal matching, you can match just the head of
the string (did I use WND-CHARM or CellProfiler? give me all the
CellProfiler measurements). Formal matching - string must match exactly (I
want my experiment analyzed with /Classifier/WndCharm/2.0, is it?)
>
> * TextAnnotation. Can be added to anything, and could be used as a
> key-value pair. Note this differs from the previous in that the
> TextAnnotation is a completely separate object which is linked to the saved
> WND-CHARM object:
>
> namespace=/Classifer/WndCharm/Version
> value=1.5
>
Meh... you can just have a table with columns of namespace, version,
software name, etc. It's OK with me, just prefer the namespace. And in our
environment, it's rare that we do an analysis with a release version -
namespace = 26c38412a1779b8135585fc014495f1488956601 means "CellProfiler
2.1.0.rc12". Rats - I tried googling the GIT hash, didn't work.
>
> * Key-Value pairs in OMERO. This work is currently ongoing.
>
> Thinking further ahead Gianluigi Zanetti's group have been working on
> bringing a graph database into OMERO [1], which would allow us to store the
> full history of all calculations (source data, all transformations,
> parameters, multiple runs, etc).
>
It would be cool, but I'm not sure if we can put the work into it to
populate it or use it to do partial analyses on our end (no time).
>
> Cheers
>
> Simon
>
> [1] https://www.openmicroscopy.org/site/support/partner/omero.biobank/
>
>
>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20140122/bc1e174a/attachment.html>
More information about the ome-devel
mailing list