<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 7, 2014 at 6:44 PM, Simon Li <span dir="ltr"><<a href="mailto:s.p.li@dundee.ac.uk" target="_blank">s.p.li@dundee.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Lee<br>
<br>
I haven't done any proper benchmarking, though I did attempt to create a<br>
feature set for an image with one ROI per pixel, the bottle-neck was the<br>
time taken to create the ROI in OMERO. When it comes to retrieval if<br>
you've stored features for a ROI both the ROI ID and associated Image ID<br>
should be present in the table so retrieval should be fast until you hit<br>
the limits of PyTables/OMERO.tables.<br></blockquote><div><br></div><div>Excellent, Simon, that's the sort of thing I was hoping to hear. A formal benchmark is always nice to have, but is just one more thing to pile onto a to-do list, but it's reassuring that you tried something at the limits of outrageous and witnessed that its effects were what you expected.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Simon<br>
<br>
On Fri, 2014-11-07 at 15:30 -0500, Lee Kamentsky wrote:<br>
> Hi Simon, just a short comment - how does it scale with the number of<br>
> ROIs per image? It looks like the initial use case is one ROI / image<br>
> - what about 1000?<br>
><br>
> On Fri, Nov 7, 2014 at 12:18 PM, Simon Li <<a href="mailto:s.p.li@dundee.ac.uk">s.p.li@dundee.ac.uk</a>> wrote:<br>
> Hi all<br>
><br>
><br>
> It’s taken a bit longer than anticipated, but here’s a first<br>
> implementation for the features API:<br>
> <a href="https://github.com/ome/omero-features" target="_blank">https://github.com/ome/omero-features</a><br>
><br>
><br>
> It’s more or less what I’ve described in my last email, to<br>
> start with I recommend looking at the README and example.py in<br>
> the repository root to see how it all works. The key features<br>
> are:<br>
> * Store and retrieve features using an Image-ID or<br>
> Roi-ID.<br>
> * Each feature set consists of a number of named<br>
> features and is per-user-per-group to avoid<br>
> permissions problems.<br>
> * The underlying storage format is OMERO.tables which<br>
> has some limitations, but one of the aims of this work<br>
> is to figure out exactly what we want to replace it<br>
> with.<br>
> As an example of using it I’ve created a new branch of<br>
> OMERO.wndcharm:<br>
> <a href="https://github.com/manics/omero-wndcharm/tree/omero-features" target="_blank">https://github.com/manics/omero-wndcharm/tree/omero-features</a><br>
> Note only the original WND-CHARM image features are stored<br>
> using the features API, other information such as<br>
> classification labels, results, and feature weights can’t be<br>
> stored at present. They would require additional columns to<br>
> store row metadata, this is an obvious next step.<br>
><br>
><br>
> There’s obviously a lot more that could be handled by the API,<br>
> but this is the sort of thing that really needs input from<br>
> potential users so take a look and let me know what you think,<br>
> either on this list, as a GitHub issue, or on the Wiki.<br>
><br>
><br>
> Cheers<br>
><br>
><br>
> Simon<br>
><br>
><br>
> From: Simon Li <<a href="mailto:s.p.li@dundee.ac.uk">s.p.li@dundee.ac.uk</a>><br>
> Date: Wednesday, 27 August 2014 15:50<br>
> To: Lee Kamentsky <<a href="mailto:leek@broadinstitute.org">leek@broadinstitute.org</a>>, "Coletta,<br>
> Christopher (NIH/NIA/IRP) [E]" <<a href="mailto:christopher.coletta@nih.gov">christopher.coletta@nih.gov</a>>,<br>
> "Ivan E. Cao-Berg" <<a href="mailto:icaoberg@andrew.cmu.edu">icaoberg@andrew.cmu.edu</a>>, Joaquin Correa<br>
> <<a href="mailto:joaquincorrea@lbl.gov">joaquincorrea@lbl.gov</a>><br>
> Cc: OME Development <<a href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a>><br>
> Subject: Re: [ome-devel] OMERO.features API development<br>
><br>
><br>
><br>
> Hi all<br>
><br>
><br>
> I had a discussion with Jason and Jean-Marie in Dundee a<br>
> couple of weeks ago on how to make progress with the<br>
> OMERO.features API. The problem of properly storing all the<br>
> metadata we require is obviously extremely important, but it's<br>
> going to take a while to figure out.<br>
><br>
><br>
> So, to begin with we came up with the idea of a simplified API<br>
> that would work as a client-side Python library and should be<br>
> relatively simple to implement. I've described it on the<br>
> OMERO.features wiki:<br>
> <a href="https://github.com/ome/omero-features/wiki/API-Outline-V1" target="_blank">https://github.com/ome/omero-features/wiki/API-Outline-V1</a><br>
><br>
><br>
> The key points are:<br>
><br>
><br>
> * Store the Image and/or ROI ID (which allows us to specify<br>
> planes/channels/tiles without needing to add columns to the<br>
> table), and feature names/values, using OMERO.tables.<br>
> * Provide methods to store a feature row, retrieve a feature<br>
> row by Image/ROI ID, and to select rows by querying feature<br>
> values (simple comparison operators), ideally with a simple to<br>
> understand syntax.<br>
> * Implement as a client-side Python library.<br>
><br>
><br>
> This means the more complicated requirements such as linking<br>
> objects other than images/ROIs, more complicated feature<br>
> metadata, and supporting both efficient bulk and random access<br>
> will be left for the next iteration. As always comments are<br>
> welcome- if you think this is too simple to be useful don't be<br>
> afraid to say so.<br>
><br>
><br>
> Cheers<br>
><br>
><br>
> Simon<br>
><br>
><br>
><br>
><br>
> On 08/05/2014 15:20, "Lee Kamentsky" <<a href="mailto:leek@broadinstitute.org">leek@broadinstitute.org</a>><br>
> wrote:<br>
><br>
><br>
> Hi all,<br>
><br>
><br>
> On Thu, May 8, 2014 at 9:40 AM, Simon Li<br>
> <<a href="mailto:s.p.li@dundee.ac.uk">s.p.li@dundee.ac.uk</a>> wrote:<br>
> Hi all<br>
><br>
><br>
> I've started a Github repository for trying<br>
> out some OMERO.features ideas based on what I<br>
> mentioned in the last email:<br>
> <a href="https://github.com/manics/omero-features" target="_blank">https://github.com/manics/omero-features</a><br>
><br>
><br>
> There's not a great deal in there at the<br>
> moment. It's just saving features into a local<br>
> HDF5 file using Pytables, and example.py<br>
> creates a table similar to that used by Pyslid<br>
> (OMERO.searcher). timings.txt shows some rough<br>
> run-times. Key-value row pairs are mapped to<br>
> table columns, however this means each row has<br>
> to have the same keys. There's no simple way<br>
> to have a key-value map per column, for now<br>
> I'm just storing multiple features in one<br>
> column.<br>
><br>
><br>
> This is easily convertible to OMERO.tables,<br>
> columns could be labelled using OMERO<br>
> annotations (in 5.1 there's a new<br>
> MapAnnotation), though it effectively means<br>
> each group of features is stored separately<br>
> and thus would need to be queried separately.<br>
> Alternatively an auxiliary table could be used<br>
> to store the per-column key-value pairs,<br>
> similar to how column descriptions are<br>
> currently stored in OMERO.tables.<br>
><br>
><br>
> A major limitation is that database joins<br>
> between OMERO and a feature-table aren't<br>
> practical. For example, if each feature row is<br>
> labelled with an image ID, and you want to<br>
> select a subset of rows using an OMERO query,<br>
> you have to pass a list of image IDs to the<br>
> Pytables query function which from my initial<br>
> testing is very limited in the number of<br>
> parameters it'll handle (I get a stack<br>
> overflow if too many image IDs are passed).<br>
><br>
><br>
> In practice this means you'd either need the<br>
> feature table to contain any metadata<br>
> necessary for selecting rows (e.g. dataset ID,<br>
> experiment parameters, annotations) even if<br>
> this means duplicating information held in<br>
> OMERO, or split the query up (very<br>
> inefficient). This is probably fine for people<br>
> dealing with features in bulk where you might<br>
> download all features for a screen for offline<br>
> processing, not so good for real-time<br>
> searching such as OMERO.searcher where you'd<br>
> either need to store everything you need for<br>
> pre-filtering search results in the table, or<br>
> read all features and do the filtering<br>
> afterwards.<br>
><br>
><br>
> Probably OK as far as developing the API is<br>
> concerned, but longer term it suggests we need<br>
> some other storage mechanism. Some of you will<br>
> remember Joaquin Correa from Paris last year.<br>
> He's currently working on his own feature<br>
> storage implementation at LBL, so potentially<br>
> this is something we could look at for OMERO,<br>
> and of course there are many other<br>
> possibilities.<br>
><br>
><br>
> People in other groups here (Broad Institute) are<br>
> looking at MongoDB as an alternative to HDF5 - we are<br>
> all sort of struggling with the same types of problems<br>
> and I don't think anyone has found a solution. In<br>
> CellProfiler, for HDF5, we maintain a dataset of<br>
> per-image indexing information into the HDF5 datasets<br>
> and perhaps that's an appropriate hybrid approach for<br>
> OMERO - the join returns the slicing information<br>
> needed for pulling the data out of the datasets and<br>
> you then retrieve the data from HDF5. We store ~1K<br>
> values per image per feature (one value per cell or<br>
> other segmented object), so for us, each round-trip<br>
> down the HDF5 stack deals with a reasonable amount of<br>
> data. HDF5 slicing isn't as flexible as Numpy - you<br>
> can ask for ranges, but not a list of individual<br>
> dimension coordinates which would be what you'd want<br>
> if you were returning data for a large number of rows.<br>
> Fetching individual values from HDF5 is painfully slow<br>
> for our scale of experiments, no really good solution,<br>
> maybe all I have to contribute on the topic is "I feel<br>
> your pain" :-(<br>
><br>
><br>
><br>
> Simon<br>
><br>
><br>
><br>
><br>
><br>
> On 24 Apr 2014, at 12:57, Lee Kamentsky<br>
> <<a href="mailto:leek@broadinstitute.org">leek@broadinstitute.org</a>> wrote:<br>
><br>
> > Hi all,<br>
> > Just chiming in, since we were mentioned...<br>
> ><br>
> > On Wed, Apr 23, 2014 at 5:10 PM, Simon Li<br>
> > <<a href="mailto:s.p.li@dundee.ac.uk">s.p.li@dundee.ac.uk</a>> wrote:<br>
> > Hi all<br>
> ><br>
> > Now that OMERO 5.0 is out of the<br>
> > way, and OMERO.searcher and WND-CHRM<br>
> > are either released or very close to<br>
> > release, I think it's time to<br>
> > restart our OMERO.features<br>
> > discussions.<br>
> ><br>
> > We got as far as the idea of a 2D<br>
> > table with any number of key-value<br>
> > pairs on each column and row, so for<br>
> > example each row could be as simple<br>
> > as (OmeroType: Image, OmeroId: 123),<br>
> > or in the case of features which are<br>
> > a function of multiple images or<br>
> > channels (OmeroType: Image, OmeroId:<br>
> > 123, Channel1: 0, Channel2: 3), etc.<br>
> > Columns could for example be<br>
> > (FeatureFamily: WndCharm, Feature:<br>
> > Zernike). Each table cell could<br>
> > either be a scalar or array.<br>
> > Retrieving features could be done by<br>
> > providing key-value pairs to be<br>
> > matched.<br>
> ><br>
> > All of this is still up for<br>
> > discussion, especially since the<br>
> > implementation of this interface<br>
> > could be challenging and there's<br>
> > some redundancy/ambiguity. Just to<br>
> > be clear, the above is a conceptual<br>
> > description of how the API would<br>
> > appear to users, the actual back-end<br>
> > could be completely different.<br>
> ><br>
> > Lee Kamentsky gave us a use case<br>
> > just before Christmas [1], Chris<br>
> > Coletta and Ivan Cao-berg are<br>
> > planning to summarise how they see<br>
> > WND-CHARM and OMERO.searcher fitting<br>
> > in. I know a few other people are<br>
> > interested in this discussion, so<br>
> > feel free to respond here or in the<br>
> > forums.<br>
> ><br>
> ><br>
> > For us, it's important to link features to<br>
> > regions of interest, specifically<br>
> > segmentations of whole cells and cellular<br>
> > compartments. The other issues have to do<br>
> > with scalability and the efficiency of<br>
> > retrieving large data sets either by<br>
> > selecting a few features for a large number<br>
> > of images (e.g. up to on the order of<br>
> > 1,000,000 images and 1,000 entries per<br>
> > feature per image) or by selecting many or<br>
> > all features associated with a subset of the<br>
> > regions of interest.<br>
> ><br>
> ><br>
> > We are also interested in recording tracking<br>
> > data. What's needed here is the ability to<br>
> > record a link between the region of interest<br>
> > in one frame of a time-series stack with a<br>
> > region of interest in a later frame and you<br>
> > need the flexibility of a many-many<br>
> > relationship to represent cell division and<br>
> > potentially merging. I'm fairly confident<br>
> > that you could encode that sort of thing in<br>
> > a 2-D table which had columns referencing<br>
> > both ROIs.<br>
> ><br>
> ><br>
> > Finally, we try to capture enough<br>
> > information about the analysis to make it<br>
> > reproducible - things like the pipeline used<br>
> > for the analysis, the GIT hash of the<br>
> > software used to run the analysis and of<br>
> > each image analyzed. I think all of that is<br>
> > easily captured, though, in the tables and I<br>
> > doubt we need any explicit functionality<br>
> > devoted to that. It might be nice to be able<br>
> > to annotate the table itself with attributes<br>
> > in order to document the linking of the<br>
> > analysis results to the experimental<br>
> > protocol, but the linking could be<br>
> > documented using columns in an<br>
> > experiment-wide table.<br>
> ><br>
> > A few of us are planning to meet up<br>
> > at the OME Paris meeting- if you're<br>
> > interested drop me an email.<br>
> ><br>
> > Thanks<br>
> ><br>
> > Simon<br>
> ><br>
> > [1]<br>
> > <a href="http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html" target="_blank">http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html</a><br>
> ><br>
> ><br>
> > On 7 Nov 2013, at 14:20, Simon Li<br>
> > <<a href="mailto:s.p.li@dundee.ac.uk">s.p.li@dundee.ac.uk</a>> wrote:<br>
> ><br>
> > > Some notes from our meeting<br>
> > yesterday:<br>
> > ><br>
> > <a href="http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout" target="_blank">http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout</a><br>
> > ><br>
> > > Summary:<br>
> > > We're thinking of representing<br>
> > features as a 2D array, with<br>
> > metadata stored as key-value maps<br>
> > attached to the array, or individual<br>
> > columns or rows. These keys could<br>
> > describe things such as the feature<br>
> > name (column), sample metadata<br>
> > (row), algorithm parameters,<br>
> > calculation pipelines, etc.<br>
> > ><br>
> > > This should work as an OMERO API-<br>
> > in order to retrieve features you'd<br>
> > pass in a set of key-value pairs,<br>
> > for instance to specify which<br>
> > features you want and which<br>
> > images/ROIs etc, and OMERO would<br>
> > handle the logic and return the<br>
> > feature table(s) matching those<br>
> > parameters. Since everyone has<br>
> > different requirements the keys<br>
> > could be anything, however we're<br>
> > trying to define a small set of<br>
> > standard keys- any suggestions are<br>
> > very welcome.<br>
> > ><br>
> > > Outside of OMERO we still need a<br>
> > format for transporting features, so<br>
> > we're thinking some form of HDF5.<br>
> > ><br>
> > > Simon<br>
> > ><br>
> > ><br>
> > > The University of Dundee is a<br>
> > registered Scottish Charity, No:<br>
> > SC015096<br>
> > ><br>
> > _______________________________________________<br>
> > > ome-devel mailing list<br>
> > ><br>
> > <a href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a><br>
> > ><br>
> > <a href="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel" target="_blank">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</a><br>
> ><br>
> ><br>
> > The University of Dundee is a<br>
> > registered Scottish Charity, No:<br>
> > SC015096<br>
> > _______________________________________________<br>
> > ome-devel mailing list<br>
> > <a href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a><br>
> > <a href="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel" target="_blank">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</a><br>
> ><br>
> ><br>
><br>
><br>
> The University of Dundee is a registered<br>
> Scottish Charity, No: SC015096<br>
><br>
> _______________________________________________<br>
> ome-devel mailing list<br>
> <a href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a><br>
> <a href="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel" target="_blank">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</a><br>
><br>
><br>
><br>
><br>
> The University of Dundee is a registered Scottish Charity, No:<br>
> SC015096<br>
><br>
> The University of Dundee is a registered Scottish Charity, No:<br>
> SC015096<br>
><br>
><br>
<br>
<br>
The University of Dundee is a registered Scottish Charity, No: SC015096<br>
</blockquote></div><br></div></div>