[ome-devel] Analysis Chain Methodology

Thu Mar 8 18:19:45 GMT 2007

Hi Brian

There is a "philosophical" reason for it.  Generally an experiment  
consists of an experimental sample and one or more controls which are  
treated the same as the experimental sample.  So philosophically, in  
a real experiment, one would never do it with just a single sample or  
image.

Practically, one very often wants to do analysis on a set of images  
rather than just one (primarily for the reason above), so building a  
system that deals with only multiple images, you get the system that  
does a single image (a list of one) for free.

I'm not being smart-alecky, just trying to explain the rationale  
behind the design (or at least point out there was one).  I don't  
know if I would be out of line in saying that for an experimentalist,  
doing processing on a single image could be considered an "edge case".

Back to philosophy.  I think where we went wrong (possibly) is that  
we've overloaded the concept of "Dataset".  One meaning of Dataset is  
a user-land named collection of images - i.e. for the user's  
organizational purposes.  Another meaning is that its a collection of  
images that were processed in exactly the same way.  Often times you  
want a single container for both - often enough that we decided to  
make it one container.  There are obviously exceptions, and we've  
used "hacks" or tricks to get around them.  One very visible hack is  
that when you import images (even just one), you are forced to either  
add them to a Dataset or create a new one.  This is because importing  
images results in executing an "Import Chain".  Having the user put  
the imported images into some sort of organizational hierarchy rather  
than just have them float around seemed like a good thing (its a  
feature, we cried, not a bug!).

Alright, so with that out of the way, what's to be done?  Code-wise,  
it could significantly complicate things (but see below) if the AE  
had to explicitly deal with images and/or datasets in its internals.   
The AE could in fact be redesigned to address the idea of iterators  
in a more general way - it should in principle be able to iterate  
over any object (for an example, an Image).  Currently, it iterates  
over only "Datasets".  Datasets "contain" images, and most modules  
are image-granularity.  Generally, it would iterate over any  
container for objects that modules operate on.  This would be a good  
masters project, possibly even a PhD project.  There's some cool  
formalism here dealing with computational work-flows, graph theory,  
and all manner of other juicy tidbits to make the hapless student's  
brain explode.

Another  option is to put something at the very outer layer of the AE  
that will accept an image as the target of a chain execution, and  
implicitly make a dataset for it.  It could even be a "special"  
dataset so that it doesn't appear in the UI.  Though, if you looked  
at the target of the chain execution, it would still be a Dataset  
(containing a single image).  An ugly hack to be sure, but possibly a  
practical solution if this is truly an "edge case" as I've claimed.

There could be a third option, which is more directly what you're  
asking for.  The linkage between datasets and chain executions is in  
only one place - in the AnalysisChainExecution object.  Everything  
else about the chain execution (the NodeExecution, for example)  
refers to module executions.  Just like ModuleExecutions,  
ChainExecutions can have a more general "target" instead of a  
"dataset".  This would take care of recording what was done, at least  
in the data model.  As for the support code, we could re-use the  
ModuleExecution pattern for dealing with what is essentially a run- 
time-typed reference to the target.  How much code that would  
involve, I don't rightly know without looking carefully, but I  
suspect not a trivial amount.  My hunch is that it could be doable  
though.  The advantage of this is that its not going quite as far as  
option 1, and would not be a blatant hack like option 2.

-Ilya

On Mar 5, 2007, at 8:17 PM, Brian Ruttenberg wrote:

> I have a question about the Analysis Chain methodology employed by  
> OME.
>
> Unless I am missing something, it seems that Analysis Chains can  
> only be
> run on data sets.  This is a major stumbling block for us.  We really
> want to run chains on single images, since our data sets are large and
> the inputs to the chain are variable (it may not be appropriate to  
> have
> the same value for every image in the dataset).  Right now I am doing
> some ugly tricks to get a chain to run on a single image - which
> basically boils down to creating a dummy dataset with one image in it.
> Obviously, this is less than ideal.
>
> I'm just curious what the reason was for not having per image chain
> support?  And, would anyone know how difficult or time intensive it
> would be to modify OME to get it to work on single images?
>
> Thanks!
>
> Brian
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>