<div dir="ltr">Hi Ivan<br><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Aug 27, 2013 at 2:26 PM, Ivan E. Cao-Berg <span dir="ltr">&lt;<a href="mailto:icaoberg@andrew.cmu.edu" target="_blank">icaoberg@andrew.cmu.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">hi! everything sounds great. i have a couple of questions and comments.<br><br>

&gt; 2. ROI Preprocessing options must be an essential part of the feature<br>

&gt; storage framework, since a single ROI with different preprocessing can<br>

&gt; result in different feature values.<br>

<br>

totally agree. then the question i ask is, how will this affect the<br>

community as a whole? i havent used cellprofiler, i have only used<br>

knime. can we guarantee numerical accuracy across any system given a<br>

particular version of the software? (not rethorical, i have no idea).<br>

<br></blockquote><div>For CellProfiler, absolute reproducibility is a goal, but we don&#39;t specify our dependencies with enough accuracy to guarantee this across platforms. Pragmatically, we have taken some care to make calculations reproducible (seeds for pseudo-random numbers, etc) and the results should be comparable across platforms. Someday, we&#39;ll reach that goal...</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">how easy will it be to exchange that information between systems?<br></blockquote><div>I think that there is too much leeway in the implementation of algorithms to have the feature output of one software package exactly match that of another. Perhaps at some point there will be an exact ontology of features that can guarantee the same result from different systems, but I am guessing it&#39;s too early for that effort. </div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

do we want to share a feature table/vector or we want to share a process<br>

that can run on their own system?<br></blockquote><div><br></div><div>In the context of CellProfiler, my plan is for CellProfiler to access the OMERO server from a client process or from multiple client processes run on a cluster. CellProfiler would upload features to OMERO in this scenario. I think it would be pretty cool if an OMERO client could request a CellProfiler analysis on an OMERO server, but that&#39;s not currently on our schedule.</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

even though these issues seems unimportant i think they are. if people are<br>

going to be publishing data online along with<br>

their research articles reproducibility of certain calculations, say<br>

feature values, is very important. how can we guarantee people can<br>

reproduce<br>

those results?<br>

<br><br>

other things i would like to point out<br>

1) in terms of feature calculation it is essential that we keep track of<br>

the resolution at which features were calculated<br></blockquote><div>CellProfiler also needs some mechanism for annotating a feature vector with the parameterization that&#39;s needed to reproduce the analysis.</div><div>

 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2) we should have a clear method that just links features to a database.<br>

some people will not want to recalculate features if they have already<br>

done it. some feature sets are computationally expensive<br></blockquote><div>A CellProfiler analysis of a field of view typically takes on the order of a minute to compute. I think recalculation on the fly isn&#39;t a good option for us.</div>

<div><br></div><div>--Lee</div></div></div></div>