[ome-devel] OMERO.features: Development of a new API for storing image features

Simon Li s.p.li at dundee.ac.uk
Wed Aug 28 11:58:42 BST 2013


On 27 Aug 2013, at 19:26, "Ivan E. Cao-Berg" <icaoberg at andrew.cmu.edu>
 wrote:

>> Let us make the following design assumptions:
>>
>> 0. We continue to use the existing OMERO image organizational structure of
>> Project->Dataset->Image->ROI
>
>> 1. Whole images can be considered ROIs for the purpose of calculating
>> features.
>
> yes, but it is important that we keep a clear distinction between field
> and roi feature sets. that being said,
> what does that mean? that once we build a model for a table, class or
> select a data structure; is it going
> to be used interchangeably between rois and field feature sets? that
> means, whatever column or structure that holds the roi ids will be
> set to null when it belongs to a field feature set?

Right now the data structure isn't crucial, if we can pin down an API first we're free to play with different storage implementations in OMERO. One of the reasons for suggesting ROIs only is that it easily handles 2D features in a 3D image, and generalises to additional dimensions. For example, PySLID is 2D only and records the channel/Z-index/timestamp for 3D/T images whereas a ROI provides a single reference. The disadvantage of course is an additional database join in situations where you want to lookup features based on an image.

Another alternative is to generalise the image/ROI ID to a (ObjectType, ObjectID) tuple, where ObjectType could be a ROI, Image, Well (which encompasses multiple images from the same screen well), Dataset or anything else.

>> 3. Feature storage backend that allows for fast query (i.e., API call to
>> produce a feature matrix given a list of ROIs & preproc opts & feature
>> names should be as close to an O(1) operation as possible, preferably on a
>> single, large easily-queryable easily-sliceable data structure, as opposed
>> to querying multiple files or doing multiple table joins.)
>
> i think it is possible to make a hash table using the "super id" we use in
> pyslid as a the hash key. essentially the key is
>
> <image id>.<pixel index>.<channel index>.<zslice index>.<timepoint
> index>.<resolution>
>
> we figured this contained enough information to make a unique identifier
> for every feature vector in the database.
> unfortunately even though searching in a hash table is O(1), vertical
> slicing is not. then again, i am not an expert on the area. any feedback
> on this?

Realistically I think we'll end up with multiple storage formats behind the same API. Some users will want to calculate features and query them in real-time, others will be happy to calculate everything on a whole screen and to "freeze" the feature set which gives us more options for optimising the storage.

> other things i would like to point out
> 3) [super low priority but i think it is important] the possibility of
> importing/exporting images where the features are attached to the metadata
> (i slightly mentioned this to melissa in paris and she told me it "should"
> be possible, at least theoretically).

I agree, this touches on an important point. An OMERO features API is the first stage, but we should also have a standard format for sharing features, whether at the ROI/image/screen/repository level, outside of OMERO.


Simon


The University of Dundee is a registered Scottish Charity, No: SC015096



More information about the ome-devel mailing list