[ome-devel] Fwd: Bio-Formats/C++/Python

Curtis Rueden ctrueden at wisc.edu
Tue Oct 5 23:13:18 BST 2010


Hi Michael, Jason & others,

The access to image formats via the Java-based bioformats has a big
> performance issue when accessing from C++/Python (or any non-Java system).
> People at Sybit/Switzerland tried there own Jace-based wrapper via BLITZ
> (libBlitzBioFormats), but had to improve Jace first, since the import of
> ONE! 512x512 image took ~60s.


I don't know much about OMERO.blitz, but my understanding is that it is a
way of doing inter-process integration cross-language and cross-machine.
Conversely, Jace provides a way to do Java/C++ integration in-process.
Hence, I am not sure what you mean by a "Jace-based wrapper via BLITZ."

The solutions we currently provide are:

1) In-process: use bf-cpp, the Bio-Formats C++ bindings (which use Jace).
2) Inter-process: access Bio-Formats over Ice. Currently not maintained.

I wrote a detailed web page discussing these options and more at:
  http://www.loci.wisc.edu/bio-formats/interfacing-non-java-code

Together with some pages on Codemesh's web site (particularly:
http://codemesh.com/in_process.html, http://codemesh.com/technology.html),
it is a good summary of the pros and cons of these various approaches.

I spent a lot of time on bf-cpp and I can say with confidence that
performance is fairly comparable to running in pure Java as long as you take
care to avoid JNI calls per pixel.

E.g., if you have a ByteArray (java byte[]) called buf, rather than
accessing buf[i] in a for loop, use JNI's "GetByteArrayRegion" method:

jbyte* jData = new jbyte[bytesPerPlane];
JNIEnv* env = jace::helper::attach();
jbyteArray jArray = static_cast<jbyteArray>(buf.getJavaJniArray());
env->GetByteArrayRegion(jArray, 0, bytesPerPlane, jData);


This copies the Java array into a data block in the C++ application's
memory—the jData pointer can then be cast to whatever type you wish.

This advice goes for whatever integration solution you use, be it in-process
or inter-process: treat the communication layer between Java and non-Java as
a bottleneck, and minimize method calls across that bridge.

You can view some working examples at:

http://dev.loci.wisc.edu/trac/software/browser/trunk/components/native/bf-itk/itkBioFormatsImageIO.cxx

http://dev.loci.wisc.edu/trac/software/browser/trunk/components/native/bf-cpp/source/showinf.cpp

http://dev.loci.wisc.edu/trac/software/browser/trunk/components/native/bf-cpp/source/minimum_writer.cpp

We use bf-cpp daily in WiscScan (LOCI's internal acquisition software) to
produce OME-TIFF. We also use bf-cpp as part of our Bio-Formats/ITK/FARSIGHT
integration, which is completely functional. And the V3D team uses bf-cpp
for their V3D Bio-Formats plugin. So I think the bindings are reasonably
useful and performant.

CellProfiler has its own way by launching a Java VM from Python.


This is true. As for why Lee didn't use the C++ bindings, my understanding
is that he felt that bf-cpp was unnecessarily complex for what he was trying
to do, and he wanted to avoid the dependency on boost-thread. However, as a
consequence his solution is very limited in scope—e.g., it does not scale
well to more complex API calls, and it may have problems in a multi-threaded
application.

The simple question behind this is: how to make the access to bioformats
> simpler and faster, which is an issue for the growing Python community.


Could you please clarify whether your colleagues were attempting to use
bf-cpp, or some other solution? I will second Jason's suggestion that we
work together, and add that clear communication is crucial. If you are
having difficulty using Bio-Formats from Python or C++, let us know the
details so that we can help troubleshoot, and improve the technology as
needed.

In the case of bf-cpp, I must apologize for the lack of documentation—I
haven't fleshed out a dedicated web page for bf-cpp yet, other than the
build instructions on the FARSIGHT wiki:

http://www.farsight-toolkit.org/wiki/FARSIGHT_Tutorials/Building_Software/Bio-Formats/Building_C%2B%2B_Bindings

I see two solutions for that.

1. Using the CellProfiler implementation as a standalone package.
> Performance is unknown to me. Short term issue.


I would caution against this approach. I don't think Lee intended the
CellProfiler implementation to be used as an external library. And I have
reservations about supporting two separate in-process solutions for
Bio-Formats.

That said, Lee did tell me the incantation needed to use the CP Bio-Formats
module:
  from cellprofiler.modules.loadimages import load_using_bioformats

2. I was wondering with Carolina Wählby from the Broad how much work it
> really is to collect the most needed formats for HC/HT screening and rewrite
> bioformats as a pure C++ library using the highly developed
> libtiff/libpng/libjpeg while providing a Python interface.


We have heard this sort of proposal before—e.g., from proponents of ITK and
the BSD license—and it seems to stem from language and/or licensing
preferences more than anything else. The reality is: if it's written in C++
you need Java wrappers to call from Java programs, and vice versa. There is
no way to escape it as long as this dichotomy between C++ and Java
exists. And I must strongly caution that even if you did language translate
portions of Bio-Formats to C++, you are unlikely to see a substantial
performance benefit in either space or time—certainly not enough to justify
the time needed for the effort.

As for time needed, Bio-Formats has been approximately 10-15 man-years of
work so far, and it reasonable to assume a language translation to C++ (even
for only a subset of formats) would take at least a few man-years—and that
doesn't address the subsequent issue of maintaining a forked codebase.

For TIFF derivate formats (and there are many) this would be a simple job
> and there a C++ libs out there solving the problem already.


Beware that some commercial TIFF variants violate the TIFF standard, which
can cause problems with libtiff. That said, libtiff is great and if someone
did want to support all these formats from C++, using libtiff whenever
possible would be the way to go.

Another approach people have used successfully is invoking Java/Bio-Formats
via system calls, and reading the results via stdout or from a file. This is
the integration approach we used with the OME perl server, and it also
worked very well.

I guess CellProfiler has the same problem. Any opinions?


My understanding is that Lee and Adam (the CP developers) have solved the
Java integration issue from CP at this point, both for Bio-Formats and for
ImageJ, and for all three major platforms (Windows, Mac OS X, Linux). There
were issues with AWT calls from native code on Mac OS X, but they were
resolved with a custom inter-process solution using sockets. So if there are
any outstanding roadblocks there, I don't know of them.

In conclusion, thanks for your feedback—I'm glad it made it to a public
list. In the future, it would be great to keep an open line of
communication, so that we can help to solve your problems and improve the
quality of the software.

-Curtis

On Fri, Oct 1, 2010 at 11:45 AM, Jason Swedlow
<jason at lifesci.dundee.ac.uk>wrote:

> Dear All-
>
> Michael Held (ETH Zurich) wrote this comment about Python implementation of
> Bio-Formats.  Any comments, feedback, or other similar experiences out
> there?
>
> In general, our own preference would be to ensure there is a **single**
> resource for file format translation.  Maintaining more than one just
> duplicates effort on something that is very difficult and tedious, even at
> the best of times.  While I do agree that making a Python HCS-only reader is
> possible, and probably not terribly hard, it's the maintenance and updates
> of this resource that is really time consuming in the end.  Until the
> various vendors coalesece around a single standard, that is just true.  So,
> if possible, let's reuse as much as we have, and work together on a
> **single** resource, whatever it is.
>
> Regarding the timings quoted, I'll let Curtis respond, but something sounds
> pretty wrong.
>
> Comments??
>
> Cheers,
>
> Jason
>
>
>
>
> Begin forwarded message:
>
> - The access to image formats via the Java-based bioformats has a big
> performance issue when accessing from C++/Python (or any non-Java system).
> People at Sybit/Switzerland tried there own Jace-based wrapper via BLITZ
> (libBlitzBioFormats), but had to improve Jace first, since the import of
> ONE! 512x512 image took ~60s. CellProfiler has its own way by launching a
> Java VM from Python.
> The simple question behind this is: how to make the access to bioformats
> simpler and faster, which is an issue for the growing Python community.
> I see two solutions for that.
> 1. Using the CellProfiler implementation as a standalone package.
> Performance is unknown to me. Short term issue.
> 2. I was wondering with Carolina Wählby from the Broad how much work it
> really is to collect the most needed formats for HC/HT screening and rewrite
> bioformats as a pure C++ library using the highly developed
> libtiff/libpng/libjpeg while providing a Python interface. For TIFF derivate
> formats (and there are many) this would be a simple job and there a C++ libs
> out there solving the problem already.
> I guess CellProfiler has the same problem. Any opinions?
>
>
>
>
>  **************************
> Wellcome Trust Centre for Gene Regulation & Expression
> College of Life Sciences
> MSI/WTB/JBC Complex
> University of Dundee
> Dow Street
> Dundee  DD1 5EH
> United Kingdom
>
> phone (01382) 385819
> Intl phone:  44 1382 385819
> FAX   (01382) 388072
> email: jason at lifesci.dundee.ac.uk
>
> Lab Page: http://gre.lifesci.dundee.ac.uk/staff/jason_swedlow.html
> Open Microscopy Environment: http://openmicroscopy.org
> **************************
>
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
>
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20101005/b4286eec/attachment-0001.html>


More information about the ome-devel mailing list