<div dir="ltr"><div>Hi Simon, all,</div>Just some quick answers to some parts:<div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 1, 2013 at 1:42 PM, Simon Li <span dir="ltr"><<a href="mailto:s.p.li@dundee.ac.uk" target="_blank">s.p.li@dundee.ac.uk</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div>
<div>Hi all</div>
<div><br>
</div>
<div><br></div>
<div>2. Feature calculation</div>
<div><br>
</div>
<div>Coming up with a way to record a whole workflow to support reproducible research is going to be a very big undertaking, though a standardised ROI and feature storage specification will be a big step. Is there any way, at least in the short term, we could
take advantage of say some of the components in CellProfiler or KNIME to record our analysis pipeline?<br></div></div></div></blockquote><div><br></div><div>For reproducibility in CellProfiler, you'd need to save the pipeline file (which controls the analysis) and a GIT hash for CellProfiler itself. I think that's a good first cut. CellProfiler outputs its own GIT hash as an analysis-wide feature.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div></div>
<div><br>
</div>
<div><br></div>
<div>3. Feature retrieval</div>
<div><br>
</div>
<div>Chris' suggestion of requesting features in the form of ([feature names], [object ids]) looks sensible. Vebjorn brings up a very good point about retrieving both row and column slices efficiently. When I spoke with Lee a few months ago in Dundee he said
features sets are often frozen after calculation, so one option would be to have a post-feature-calculation task to optimise the storage format, if necessary duplicating the data so both row an column operations are fast. Effectively we'd have multiple implementations
of the same API, transparent to the client.</div>
<div><br></div></div></div></blockquote><div>Thanks Simon - good idea. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div>
</div>
<div><br>
</div>
<div>In the interests of getting things moving I think concentrating on feature storage/retrieval first might be better than trying to do everything at once. </div></div></div></blockquote><div> </div><div>Again good, looking forward to it.</div>
<div><br></div><div>--Lee </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>
<div><br></div></div></div></blockquote></div></div></div>