[ome-devel] Avian VM and Bio-Formats

Johannes Schindelin Johannes.Schindelin at gmx.de
Thu Oct 17 01:28:47 BST 2013


Hi Ullrich,

a little background information for everybody else: Ullrich is the
inventor and maintainer of VIGRA, an incredibly powerful C++ library to
process and analyze images. Its design was the most important source of
inspiration for ImgLib2. As a consequence, I was really delighted to meet
Ullrich at the EuBIAS meeting in Barcelona last week. Among other things,
we talked extensively about ways to interoperate between ImgLib2 and
VIGRA, and the lack of support in VIGRA to read file formats via the
Bio-Formats library.

Also at that meeting, I got into contact with the OpenMole project, and in
particular with Mark Hammons who not only explained patiently the benefits
of the Scala language to me, but also pointed out the existence of the
Avian VM, a small, embeddable, BSD-licensed Java Virtual Machine written
in C++, with a tiny Just-In-Time compiler, just enough to make it a
practical choice for running limited Java inside a C++ program/library:
http://oss.readytalk.com/avian/

It is actually very, very, *very* limited, as I found out when I tried to
get it to run the bfconvert tool of Bio-Formats.

But I did get it to run bfconvert. To be precise, in my hands, the 5MB
executable compiled from my fork of Avian was able to run my
(minimally-diverging) Bio-Formats v4.4.8 fork's
loci.formats.tools.ImageConverter class to convert Fiji's icon.png (my
standard example) to a .tiff file (although the colors are off, but
running the same example with plain Java results in a byte-identical
file).

Although it is too early for proper benchmarking, things seem to be
comparable between Java and Avian: while Java is slightly faster in what
time(1) calls "real" time (~0.54s vs ~0.72s), in terms of "user" time
roles are reversed (~0.70s vs ~0.63s). Roughly the same holds true for the
dynamic Avian executable (which is only 11K and links to Avian's libjvm.so
weighing in with a whopping 1.2M). The major benefit will most likely be
space: linking to libjvm.so should be enough, the standard Java classes
are included therein. So you can get Java support in VIGRA (or any other
C++ project) by adding a library that is slightly larger than one
megabyte.

Now to some more detailed explanations about the challenges I faced, for
the technically-inclined.

On the Bio-Formats side, there are two major and one minor (and one micro)
issue:

- to enable logging, Bio-Formats uses a cute hack called
  ReflectedUniverse, which is basically a scripting language for Java
  itself. However, it uses regular expressions, something that Avian's
  class library does not yet support.

- Bio-Formats uses a concurrent hashmap in loci.common.Location.
  Concurrency (JSR-166) is not supported by Avian's class library yet.

- in loci.formats.gui.LegacyQTTools, the value of the java.library.path
  property is used without checking whether the property is null (unset).

- some debug logging in Bio-Formats uses String.format() (not yet
  supported by Avian's class library).

All of these issues could be worked around in the code calling
Bio-Formats: there is no need to use the ImageConverter class (which
enables the logging), the concurrent hashmap could be faked by extending
the non-concurrent hashmap (and just not bothering with the concurrency
because we will most likely instantiate one VM per Bio-Formats call), the
java.library.path property could be initialized to a dummy value and the
logging could be switched off completely.

On the Avian side, I am happy to report that I only had to extend the
class library (apart from one bug fix that I will contribute upstream
later this week). In particular, I had to

- make RandomAccessFile support writing

- provide three dummy AWT classes because Bio-Formats actually links to
  them, but does not use them by default

- provide a couple of interfaces and exceptions that were not yet included
  in Avian

- implement a minimal java.io.DataOutputStream

- provide a FileChannel for RandomAccessFiles (i.e. implement getChannel)

- implement java.lang.Boolean's parse(String) method

- implement java.lang.Character's isISOControl(char) method

- implement fake byte order methods for ByteBuffers (they are hardcoded to
  big endian, but that is what Bio-Formats required in my test)

- provide a dummy SimpleDateFormat which actually does not work, but does
  not need to

- add toString(long[]) to java.util.Arrays

- teach Avian's pseudo regular expressions about "\\", i.e. the backslash

For the moment, all of these changes are contained in hodge podge
work-in-progress commits which I will clean up over the next few days. You
can see them here (the linked pages will update whenever I get around to
clean up the commits):

https://github.com/dscho/avian/compare/bio-formats
https://github.com/dscho/bioformats/compare/v4.4.8...avian

Since VIGRA and ImgLib2 share design concepts, and since SCIFIO supports
ImgLib2 natively, I think that my plan to extend the test to SCIFIO after
cleaning up my Avian fork is sound.

Once that is done, I will add unit tests that our trusty Jenkins will run
whenever SCIFIO (or Avian) changes, to ensure that things continue to
work. These unit tests will be extended as needed, e.g. when someone finds
out that an important use case requires more changes in Avian, still.

This way to call Bio-Formats and SCIFIO could become the default C++ entry
point into those libraries.

Summary: I showed that Bio-Formats/SCIFIO support for VIGRA and other C++
libraries is feasible through the BSD-licensed embeddable Java virtual
machine "Avian". I am confident that I can clean up my patches this week
still, to be contributed to the Avian project.

Ciao,
Johannes


More information about the ome-devel mailing list