<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    Hi all,<br>

    <br>

    &nbsp;&nbsp;&nbsp; I have been following the discussion on OMERO.features as it

    does fall quite quite close to issues we have and things we are

    doing as well, even though I did not participate. I thought I would

    chime in to say what we are up to in this domain; we'll be in Paris

    for the OME user meeting likely with a poster and happy to

    discuss...<br>

    <br>

    &nbsp;&nbsp;&nbsp; We are doing yeast genome wide HT/HC microscopy studies and up

    to now we've used a SQL database to store cell-wise feature

    extracted from images (as we are just one lab with coherent

    experiments -but a lot of data- it sounded like a good idea). We are

    currently looking into Neo4J as a more flexible solution, and

    working on a HTML5 front-end enabling visualisation and mining to

    the wider audience of experimental biologist. Although I haven't

    used it that deeply -yet-, I quite like neo4j as it does keep some

    structure to the data while not being as rigid as sql. And I guess

    we went the opposite way and stored all relevant information in the

    SQL/Neo4J DB, with the link with OMERO being provided by having the

    IDs of omero objects stored somewhere.<br>

    <br>

    &nbsp;&nbsp; Again, happy to talk in Paris about those things, which are

    indeed a common pain in the community...<br>

    <br>

    Cheers<br>

    Anatole<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 08/05/2014 14:40, Simon Li wrote:<br>

    </div>

    <blockquote

      cite="mid:324E93CC-3390-4A86-BBE8-A933C3D001D9@dundee.ac.uk"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <div>

        <div>Hi all</div>

        <div><br>

        </div>

        <div>I've started a Github repository for trying out some

          OMERO.features ideas based on what I mentioned in the last

          email:</div>

        <div><a moz-do-not-send="true"

            href="https://github.com/manics/omero-features">https://github.com/manics/omero-features</a></div>

        <div><br>

        </div>

        <div>There's not a great deal in there at the moment.&nbsp;It's just

          saving features into a local HDF5 file using Pytables, and

          example.py creates a table similar to that used by Pyslid

          (OMERO.searcher). timings.txt shows some rough

          run-times.&nbsp;Key-value row pairs are mapped to table columns,

          however this means each row has to have the same keys.&nbsp;There's

          no simple way to have a key-value map per column, for now I'm

          just storing multiple features in one column.</div>

        <div><br>

        </div>

        <div>This is easily convertible to OMERO.tables, columns could

          be labelled using OMERO annotations (in 5.1 there's a new

          MapAnnotation), though it effectively means each group of

          features is stored separately and thus would need to be

          queried separately. Alternatively an auxiliary table could be

          used to store the per-column key-value pairs, similar to how

          column descriptions are currently stored in OMERO.tables.</div>

        <div><br>

        </div>

        <div>A major limitation is that database joins between OMERO and

          a feature-table aren't practical.&nbsp;For example, if each feature

          row is labelled with an image ID, and you want to select a

          subset of rows using an OMERO query, you have to pass a list

          of image IDs to the Pytables query function which from my

          initial testing is very limited in the number of parameters

          it'll handle (I get a stack overflow if too many image IDs are

          passed).</div>

        <div><br>

        </div>

        <div>In practice this means you'd either need the feature table

          to contain any metadata necessary for selecting rows (e.g.

          dataset ID, experiment parameters, annotations) even if this

          means duplicating information held in OMERO, or split the

          query up (very inefficient).&nbsp;This is probably fine for people

          dealing with features in bulk where you might download all

          features for a screen for offline processing, not so good for

          real-time searching such as OMERO.searcher where you'd either

          need to store everything you need for pre-filtering search

          results in the table, or read all features and do the

          filtering afterwards.</div>

        <div><br>

        </div>

        <div>Probably OK as far as developing the API is concerned, but

          longer term it suggests we need some other storage

          mechanism.&nbsp;Some of you will remember Joaquin Correa from Paris

          last year. He's currently working on his own feature storage

          implementation at LBL, so potentially this is something we

          could look at for OMERO, and of course there are many other

          possibilities.</div>

        <div><br>

        </div>

        <div>Simon</div>

      </div>

      <div><br>

      </div>

      <div><br>

      </div>

      <br>

      <div>

        <div>On 24 Apr 2014, at 12:57, Lee Kamentsky &lt;<a

            moz-do-not-send="true" href="mailto:leek@broadinstitute.org">leek@broadinstitute.org</a>&gt;

          wrote:</div>

        <br class="Apple-interchange-newline">

        <blockquote type="cite">

          <div dir="ltr">Hi all,<br>

            <div class="gmail_extra">Just chiming in, since we were

              mentioned...<br>

              <br>

              <div class="gmail_quote">On Wed, Apr 23, 2014 at 5:10 PM,

                Simon Li <span dir="ltr">

                  &lt;<a moz-do-not-send="true"

                    href="mailto:s.p.li@dundee.ac.uk" target="_blank">s.p.li@dundee.ac.uk</a>&gt;</span>

                wrote:<br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  Hi all<br>

                  <br>

                  Now that OMERO 5.0 is out of the way, and

                  OMERO.searcher and WND-CHRM are either released or

                  very close to release, I think it's time to restart

                  our OMERO.features discussions.<br>

                  <br>

                  We got as far as the idea of a 2D table with any

                  number of key-value pairs on each column and row, so

                  for example each row could be as simple as (OmeroType:

                  Image, OmeroId: 123), or in the case of features which

                  are a function of multiple images or channels

                  (OmeroType: Image, OmeroId: 123, Channel1: 0,

                  Channel2: 3), etc. Columns could for example be

                  (FeatureFamily: WndCharm, Feature: Zernike). Each

                  table cell could either be a scalar or array.

                  Retrieving features could be done by providing

                  key-value pairs to be matched.<br>

                  <br>

                  All of this is still up for discussion, especially

                  since the implementation of this interface could be

                  challenging and there's some redundancy/ambiguity.

                  Just to be clear, the above is a conceptual

                  description of how the API would appear to users, the

                  actual back-end could be completely different.<br>

                  <br>

                  Lee Kamentsky gave us a use case just before Christmas

                  [1], Chris Coletta and Ivan Cao-berg are planning to

                  summarise how they see WND-CHARM and OMERO.searcher

                  fitting in. I know a few other people are interested

                  in this discussion, so feel free to respond here or in

                  the forums.<br>

                </blockquote>

                <div><br>

                </div>

                <div>For us, it's important to link features to regions

                  of interest, specifically segmentations of whole cells

                  and cellular compartments. The other issues have to do

                  with scalability and the efficiency of retrieving

                  large data sets either by selecting a few features for

                  a large number of images (e.g. up to on the order of

                  1,000,000 images and 1,000 entries per feature per

                  image) or by selecting many or all features associated

                  with a subset of the regions of interest.</div>

                <div><br>

                </div>

                <div>We are also interested in recording tracking data.

                  What's needed here is the ability to record a link

                  between the region of interest in one frame of a

                  time-series stack with a region of interest in a later

                  frame and you need the flexibility of a many-many

                  relationship to represent cell division and

                  potentially merging. I'm fairly confident that you

                  could encode that sort of thing in a 2-D table which

                  had columns referencing both ROIs.</div>

                <div><br>

                </div>

                <div>Finally, we try to capture enough information about

                  the analysis to make it reproducible - things like the

                  pipeline used for the analysis, the GIT hash of the

                  software used to run the analysis and of each image

                  analyzed. I think all of that is easily captured,

                  though, in the tables and I doubt we need any explicit

                  functionality devoted to that. It might be nice to be

                  able to annotate the table itself with attributes in

                  order to document the linking of the analysis results

                  to the experimental protocol, but the linking could be

                  documented using columns in an experiment-wide table.</div>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <br>

                  A few of us are planning to meet up at the OME Paris

                  meeting- if you're interested drop me an email.<br>

                  <br>

                  Thanks<br>

                  <br>

                  Simon<br>

                  <br>

                  [1] <a moz-do-not-send="true"

href="http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html"

                    target="_blank">

http://lists.openmicroscopy.org.uk/pipermail/ome-devel/2013-November/002573.html</a><br>

                  <br>

                  <br>

                  On 7 Nov 2013, at 14:20, Simon Li &lt;<a

                    moz-do-not-send="true"

                    href="mailto:s.p.li@dundee.ac.uk">s.p.li@dundee.ac.uk</a>&gt;

                  wrote:<br>

                  <br>

                  &gt; Some notes from our meeting yesterday:<br>

                  &gt; <a moz-do-not-send="true"

href="http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout"

                    target="_blank">

http://www.openmicroscopy.org/site/community/minutes/minigroup/omero-features-meetings/2013-11-06-omero-features-google-hangout</a><br>

                  &gt;<br>

                  &gt; Summary:<br>

                  &gt; We're thinking of representing features as a 2D

                  array, with metadata stored as key-value maps attached

                  to the array, or individual columns or rows. These

                  keys could describe things such as the feature name

                  (column), sample metadata (row), algorithm parameters,

                  calculation pipelines, etc.<br>

                  &gt;<br>

                  &gt; This should work as an OMERO API- in order to

                  retrieve features you'd pass in a set of key-value

                  pairs, for instance to specify which features you want

                  and which images/ROIs etc, and OMERO would handle the

                  logic and return the feature table(s) matching those

                  parameters. Since everyone has different requirements

                  the keys could be anything, however we're trying to

                  define a small set of standard keys- any suggestions

                  are very welcome.<br>

                  &gt;<br>

                  &gt; Outside of OMERO we still need a format for

                  transporting features, so we're thinking some form of

                  HDF5.<br>

                  &gt;<br>

                  &gt; Simon<br>

                  &gt;<br>

                  &gt;<br>

                  &gt; The University of Dundee is a registered Scottish

                  Charity, No: SC015096<br>

                  &gt; _______________________________________________<br>

                  &gt; ome-devel mailing list<br>

                  &gt; <a moz-do-not-send="true"

                    href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a><br>

                  &gt; <a moz-do-not-send="true"

                    href="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel"

                    target="_blank">

http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</a><br>

                  <br>

                  <br>

                  The University of Dundee is a registered Scottish

                  Charity, No: SC015096<br>

                  _______________________________________________<br>

                  ome-devel mailing list<br>

                  <a moz-do-not-send="true"

                    href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a><br>

                  <a moz-do-not-send="true"

                    href="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel"

                    target="_blank">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</a><br>

                </blockquote>

              </div>

              <br>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

      <br>

      <span style="font-size:10pt;">The University of Dundee is a

        registered Scottish Charity, No: SC015096</span>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

ome-devel mailing list

<a class="moz-txt-link-abbreviated" href="mailto:ome-devel@lists.openmicroscopy.org.uk">ome-devel@lists.openmicroscopy.org.uk</a>

<a class="moz-txt-link-freetext" href="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</a>

</pre>

      <br>

      <pre class="moz-signature" cols="72">-- 

Anatole Chessel, PhD

Research associate

University of Cambridge

Tennis Court Road, Cambridge CB2 1PD

tel: +44 (0)1223 334065</pre>

    </blockquote>

  </body>

</html>