[ome-devel] [cellprofiler-dev] Should we eject PIL? (hint: yes)

Curtis Rueden ctrueden at wisc.edu
Sun Oct 16 21:06:11 BST 2011


Hi Anne et. al,

I am CCing the OME-devel list, since your question is relevant there too.

Indeed, there is talk of creating a C-version of BioFormats.  I'm copying
> Curtis Rueden of the U Wisc LOCI team that develops BioFormats in case he
> can comment on the timeline and/or likelihood.
>

This has been a perennial issue since Bio-Formats began. As you know, we are
very cognizant of the need for software to interoperate well cross-platform
and cross-language—fostering such interoperability is one of Bio-Formats's
most important goals. Hence, there has been much discussion and
consideration of how best to accomplish it.

We believe that Java provides the best fit for the Bio-Formats project (see
FAQ entry: Why is Bio-Formats written in
Java?<https://www.openmicroscopy.org/site/support/faq/bio-formats/why-is-bio-formats-written-in-java>).
Remaining cross-platform is a top priority for us, as is the integration
with OMERO and ImageJ.

We have explored many ways to provide access to Bio-Formats from C, C++ and
Python (see FAQ entry: How can I invoke Bio-Formats from my language of
choice?<https://www.openmicroscopy.org/site/support/faq/bio-formats/how-can-i-invoke-bio-formats-from-my-language-of-choice>),
and we now have multiple practical solutions which provide a starting point
for developers (see Bio-Formats C++
Bindings<http://loci.wisc.edu/bio-formats/bio-formats-c-bindings>,
ITK integration <http://loci.wisc.edu/bio-formats/itk>, Lee Kamentsky's
javabridge <http://pypi.python.org/pypi/javabridge/0.2>). But it is a
complex issue, very difficult to make totally turn-key. We know that more
can be done, and we have plans to expand these approaches and documentation
to make it as easy as possible to integrate with Bio-Formats from non-Java
code.

We have also discussed the idea of a C or C++ version of Bio-Formats, as you
mention (see FAQ entry: Are you considering translating Bio-Formats to any
other languages?<https://www.openmicroscopy.org/site/support/faq/bio-formats/are-you-considering-translating-bio-formats-to-any-other-languages>).
There are two possible approaches: 1) maintain a parallel version in C/C++;
or 2) migrate Bio-Formats to C/C++ completely and provide Java bindings on
top. Either way would require thousands of person-hours to translate the
existing codebase into C/C++, but option #1 creates an especially pernicious
maintenance problem in that changes to the software—including both bugfixes
and improvements—would need to be done twice, once for the Java version and
once for the C/C++ version. We simply do not have the resources for that. So
that leaves option #2, which would essentially turn the current situation
upside down: Java developers would be asking for a pure-Java version,
because cross-platform integration with Bio-Formats would become much more
difficult.

Either way, before investing such a large amount of time, it is important to
fully understand and agree on the rationale for the change. What goals are
we trying to accomplish? Is it mainly to improve Bio-Formats's time
performance? If so, then I assert that a C/C++ version would not be
significantly faster, since Java's I/O performance is comparable or even
superior to C/C++'s <http://drdobbs.com/java/184401976?pgno=15> (see FAQ
entry: Isn't Java too
slow?<https://www.openmicroscopy.org/site/support/faq/bio-formats/isnt-java-too-slow>).
Or is the major goal to ease integration with CellProfiler's Python
codebase? If so, let's discuss further from that perspective.

Also, we were discussing whether the very slow performance on large image
> sets due to over-use of the stat() command was fixed, which I think it is.
>

The best way to solve performance problems with Bio-Formats is to report a
bug and send an example dataset. The vast majority of the time, the issue is
with a specific file format reader having an inefficient algorithm, which
could be easily improved given a test case illustrating the problem.

Side note: if your team has the time to investigate performance issues
themselves, the quickest way to find the bottleneck is often to read the
problem file with the "showinf" command line tool, and press Ctrl+\ in the
console during the slow initialization or plane extraction. A full Java
stack trace will appear including line numbers, which generally identifies
the bottleneck section of code. Optimizing these sections of code is the
pretty much the same in every programming language: avoid repeated small
unbuffered reads, avoid unnecessary seeks, avoid unnecessary nested loops
and "per-pixel" operations, etc.

In summary, we are committed to doing whatever we can to facilitate the use
of Bio-Formats in a broad array of contexts. If that means a full language
translation of Bio-Formats, then so be it. However, our current belief is
that such an effort would be neither advantageous nor time-efficient
compared to other solutions.

Regards,
Curtis


On Sat, Oct 15, 2011 at 6:54 AM, Anne Carpenter <anne at broadinstitute.org>wrote:

> Indeed, there is talk of creating a C-version of BioFormats.  I'm
> copying Curtis Rueden of the U Wisc LOCI team that develops BioFormats
> in case he can comment on the timeline and/or likelihood.
>
> (Curtis, we've been discussing the image reading libraries underlying
> CellProfiler, currently BioFormats + PIL, and the idea was raised of
> adding use of the Bio-Image C/C++ library which may be faster in some
> instances. We were wondering the timeline of BioFormats being done in
> C/C++ and also whether there is any existing or planned collaboration
> with the Bio-Image project. Also, we were discussing whether the very
> slow performance on large image sets due to over-use of the stat()
> command was fixed, which I think it is.)
> http://biodev.ece.ucsb.edu/projects/imgcnv/wiki/libioimg
>
> Anne
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20111016/3d679a43/attachment.html>


More information about the ome-devel mailing list