[ome-devel] [cellprofiler-dev] Should we eject PIL? (hint: yes)
Jason Swedlow
jason at lifesci.dundee.ac.uk
Wed Oct 19 17:07:28 BST 2011
Dear All-
As Curtis mentions, OME as a whole is aware of the importance of maintaining the current version of Bio-Formats-- it's used by a large number of people, and is the mainstay of alot of scientific work. Also, and perhaps most importantly, sending us data so we can validate where a problem lies would be very helpful. Many times, we've found performance problems relating to underlying assumptions we've made in file converters that have nothing to do with the runtime environment.
That being said, there has been, and clearly will be, significant interest in a fully native C++ solution for Bio-Formats. In weighing the reasons to do this, there may (or may not be) performance, ease of use, integration, and many other factors that need to be considered. Regardless, here’s the bottom line-- we aim to make Bio-Formats the single solution for scientific image access, and we'll do what is necessary to make that happen.
Curtis rightly points out that doing this work is a significant resource issue. Glencoe Software has committed significant salary resources over the last 4 years to Bio-Formats and indeed has been the main source of funding for work on Bio-Formats. This commitment will continue and may even grow. The Dundee OME group has been funded by the Wellcome Trust to provide a fully native Bio-Formats solution. This award also provides funds for the Madison OME group to be working on OME ImageJ functionality with Dundee and a major goal of this work is robust support of Bio-Formats from all applications regardless of language and interface. So, we have the resources to build tools for the community and to drive interoperability across the bioimaging sciences, and that includes ensuring native access to Bio-Formats.
Across the project, we are now filling the open positions, and will soon be in able to build the detailed technical plans for this work. Your and the wider community's feedback on a native Bio-Formats solution is most welcome. We'll do what needs to be done to make Bio-Formats work for as many applications as possible. Keep an eye on the OME Roadmap pages (http://trac.openmicroscopy.org.uk/ome/roadmap) for info on our work.
In the meantime, send us your thoughts and comments. Please do continue to send us problem files, and we will, as always, identify any problems, fix whatever we can, and release these fixes in the latest builds.
As always, thanks for your support.
Cheers,
Jason
On 16 Oct 2011, at 21:06, Curtis Rueden wrote:
> Hi Anne et. al,
>
> I am CCing the OME-devel list, since your question is relevant there too.
>
> Indeed, there is talk of creating a C-version of BioFormats. I'm copying Curtis Rueden of the U Wisc LOCI team that develops BioFormats in case he can comment on the timeline and/or likelihood.
>
> This has been a perennial issue since Bio-Formats began. As you know, we are very cognizant of the need for software to interoperate well cross-platform and cross-language—fostering such interoperability is one of Bio-Formats's most important goals. Hence, there has been much discussion and consideration of how best to accomplish it.
>
> We believe that Java provides the best fit for the Bio-Formats project (see FAQ entry: Why is Bio-Formats written in Java?). Remaining cross-platform is a top priority for us, as is the integration with OMERO and ImageJ.
>
> We have explored many ways to provide access to Bio-Formats from C, C++ and Python (see FAQ entry: How can I invoke Bio-Formats from my language of choice?), and we now have multiple practical solutions which provide a starting point for developers (see Bio-Formats C++ Bindings, ITK integration, Lee Kamentsky's javabridge). But it is a complex issue, very difficult to make totally turn-key. We know that more can be done, and we have plans to expand these approaches and documentation to make it as easy as possible to integrate with Bio-Formats from non-Java code.
>
> We have also discussed the idea of a C or C++ version of Bio-Formats, as you mention (see FAQ entry: Are you considering translating Bio-Formats to any other languages?). There are two possible approaches: 1) maintain a parallel version in C/C++; or 2) migrate Bio-Formats to C/C++ completely and provide Java bindings on top. Either way would require thousands of person-hours to translate the existing codebase into C/C++, but option #1 creates an especially pernicious maintenance problem in that changes to the software—including both bugfixes and improvements—would need to be done twice, once for the Java version and once for the C/C++ version. We simply do not have the resources for that. So that leaves option #2, which would essentially turn the current situation upside down: Java developers would be asking for a pure-Java version, because cross-platform integration with Bio-Formats would become much more difficult.
>
> Either way, before investing such a large amount of time, it is important to fully understand and agree on the rationale for the change. What goals are we trying to accomplish? Is it mainly to improve Bio-Formats's time performance? If so, then I assert that a C/C++ version would not be significantly faster, since Java's I/O performance is comparable or even superior to C/C++'s (see FAQ entry: Isn't Java too slow?). Or is the major goal to ease integration with CellProfiler's Python codebase? If so, let's discuss further from that perspective.
>
> Also, we were discussing whether the very slow performance on large image sets due to over-use of the stat() command was fixed, which I think it is.
>
> The best way to solve performance problems with Bio-Formats is to report a bug and send an example dataset. The vast majority of the time, the issue is with a specific file format reader having an inefficient algorithm, which could be easily improved given a test case illustrating the problem.
>
> Side note: if your team has the time to investigate performance issues themselves, the quickest way to find the bottleneck is often to read the problem file with the "showinf" command line tool, and press Ctrl+\ in the console during the slow initialization or plane extraction. A full Java stack trace will appear including line numbers, which generally identifies the bottleneck section of code. Optimizing these sections of code is the pretty much the same in every programming language: avoid repeated small unbuffered reads, avoid unnecessary seeks, avoid unnecessary nested loops and "per-pixel" operations, etc.
>
> In summary, we are committed to doing whatever we can to facilitate the use of Bio-Formats in a broad array of contexts. If that means a full language translation of Bio-Formats, then so be it. However, our current belief is that such an effort would be neither advantageous nor time-efficient compared to other solutions.
>
> Regards,
> Curtis
>
>
> On Sat, Oct 15, 2011 at 6:54 AM, Anne Carpenter <anne at broadinstitute.org> wrote:
> Indeed, there is talk of creating a C-version of BioFormats. I'm
> copying Curtis Rueden of the U Wisc LOCI team that develops BioFormats
> in case he can comment on the timeline and/or likelihood.
>
> (Curtis, we've been discussing the image reading libraries underlying
> CellProfiler, currently BioFormats + PIL, and the idea was raised of
> adding use of the Bio-Image C/C++ library which may be faster in some
> instances. We were wondering the timeline of BioFormats being done in
> C/C++ and also whether there is any existing or planned collaboration
> with the Bio-Image project. Also, we were discussing whether the very
> slow performance on large image sets due to over-use of the stat()
> command was fixed, which I think it is.)
> http://biodev.ece.ucsb.edu/projects/imgcnv/wiki/libioimg
>
> Anne
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
**************************
Wellcome Trust Centre for Gene Regulation & Expression
College of Life Sciences
MSI/WTB/JBC Complex
University of Dundee
Dow Street
Dundee DD1 5EH
United Kingdom
phone (01382) 385819
Intl phone: 44 1382 385819
FAX (01382) 388072
email: jason at lifesci.dundee.ac.uk
Lab Page: http://www.lifesci.dundee.ac.uk/gre/staff/jason-swedlow
Open Microscopy Environment: http://openmicroscopy.org
**************************
The University of Dundee is a Scottish Registered Charity, No. SC015096.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20111019/a091f651/attachment-0001.html>
More information about the ome-devel
mailing list