Hi Anne et. al,<br><br>I am CCing the OME-devel list, since your question is relevant there too.<br>
<br><blockquote style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;" class="gmail_quote">Indeed, there is talk of creating a C-version of BioFormats. I'm copying Curtis Rueden of the U Wisc LOCI team that develops BioFormats in case he can comment on the timeline and/or likelihood.<br>
</blockquote><br>This has been a perennial issue since Bio-Formats began. As you know, we are very cognizant of the need for software to interoperate well cross-platform and cross-language—fostering such interoperability is one of Bio-Formats's most important goals. Hence, there has been much discussion and consideration of how best to accomplish it.<br>
<br>We believe that Java provides the best fit for the Bio-Formats project (see FAQ entry: <a href="https://www.openmicroscopy.org/site/support/faq/bio-formats/why-is-bio-formats-written-in-java">Why is Bio-Formats written in Java?</a>). Remaining cross-platform is a top priority for us, as is the integration with OMERO and ImageJ.<br>
<br>We have explored many ways to provide access to Bio-Formats from C, C++ and Python (see FAQ entry: <a href="https://www.openmicroscopy.org/site/support/faq/bio-formats/how-can-i-invoke-bio-formats-from-my-language-of-choice">How can I invoke Bio-Formats from my language of choice?</a>), and we now have multiple practical solutions which provide a starting point for developers (see <a href="http://loci.wisc.edu/bio-formats/bio-formats-c-bindings">Bio-Formats C++ Bindings</a>, <a href="http://loci.wisc.edu/bio-formats/itk">ITK integration</a>, Lee Kamentsky's <a href="http://pypi.python.org/pypi/javabridge/0.2">javabridge</a>). But it is a complex issue, very difficult to make totally turn-key. We know that more can be done, and we have plans to expand these approaches and documentation to make it as easy as possible to integrate with Bio-Formats from non-Java code.<br>
<br>We have also discussed the idea of a C or C++ version of Bio-Formats, as you mention (see FAQ entry: <a href="https://www.openmicroscopy.org/site/support/faq/bio-formats/are-you-considering-translating-bio-formats-to-any-other-languages">Are you considering translating Bio-Formats to any other languages?</a>). There are two possible approaches: 1) maintain a parallel version in C/C++; or 2) migrate Bio-Formats to C/C++ completely and provide Java bindings on top. Either way would require thousands of person-hours to translate the existing codebase into C/C++, but option #1 creates an especially pernicious maintenance problem in that changes to the software—including both bugfixes and improvements—would need to be done twice, once for the Java version and once for the C/C++ version. We simply do not have the resources for that. So that leaves option #2, which would essentially turn the current situation upside down: Java developers would be asking for a pure-Java version, because cross-platform integration with Bio-Formats would become much more difficult.<br>
<br>Either way, before investing such a large amount of time, it is important to fully understand and agree on the rationale for the change. What goals are we trying to accomplish? Is it mainly to improve Bio-Formats's time performance? If so, then I assert that a C/C++ version would not be significantly faster, since <a href="http://drdobbs.com/java/184401976?pgno=15">Java's I/O performance is comparable or even superior to C/C++'s</a> (see FAQ entry: <a href="https://www.openmicroscopy.org/site/support/faq/bio-formats/isnt-java-too-slow">Isn't Java too slow?</a>). Or is the major goal to ease integration with CellProfiler's Python codebase? If so, let's discuss further from that perspective.<br>
<br><blockquote style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;" class="gmail_quote">Also, we were discussing whether the very slow performance on large image sets due to over-use of the stat() command was fixed, which I think it is.<br>
</blockquote><br>The best way to solve performance problems with Bio-Formats is to report a bug and send an example dataset. The vast majority of the time, the issue is with a specific file format reader having an inefficient algorithm, which could be easily improved given a test case illustrating the problem.<br>
<br>Side note: if your team has the time to investigate performance issues themselves, the quickest way to find the bottleneck is often to read the problem file with the "showinf" command line tool, and press Ctrl+\ in the console during the slow initialization or plane extraction. A full Java stack trace will appear including line numbers, which generally identifies the bottleneck section of code. Optimizing these sections of code is the pretty much the same in every programming language: avoid repeated small unbuffered reads, avoid unnecessary seeks, avoid unnecessary nested loops and "per-pixel" operations, etc.<br>
<br>In summary, we are committed to doing whatever we can to facilitate the use of Bio-Formats in a broad array of contexts. If that means a full language translation of Bio-Formats, then so be it. However, our current belief is that such an effort would be neither advantageous nor time-efficient compared to other solutions.<br>
<br>Regards,<br>Curtis<br><br><br><div class="gmail_quote">On Sat, Oct 15, 2011 at 6:54 AM, Anne Carpenter <span dir="ltr"><<a href="mailto:anne@broadinstitute.org">anne@broadinstitute.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Indeed, there is talk of creating a C-version of BioFormats. I'm<br>
copying Curtis Rueden of the U Wisc LOCI team that develops BioFormats<br>
in case he can comment on the timeline and/or likelihood.<br>
<br>
(Curtis, we've been discussing the image reading libraries underlying<br>
CellProfiler, currently BioFormats + PIL, and the idea was raised of<br>
adding use of the Bio-Image C/C++ library which may be faster in some<br>
instances. We were wondering the timeline of BioFormats being done in<br>
C/C++ and also whether there is any existing or planned collaboration<br>
with the Bio-Image project. Also, we were discussing whether the very<br>
slow performance on large image sets due to over-use of the stat()<br>
command was fixed, which I think it is.)<br>
<a href="http://biodev.ece.ucsb.edu/projects/imgcnv/wiki/libioimg" target="_blank">http://biodev.ece.ucsb.edu/projects/imgcnv/wiki/libioimg</a><br>
<font color="#888888"><br>
Anne<br>
</font></blockquote></div><br>