[ome-devel] Discussion of future ome-model work

Wed Dec 12 15:42:58 GMT 2018

Dear all,

In the course of working on OME Files, I've encountered a few issues 
which require addressing in order to make it suitable for wider 
deployment.  Some of these issues also affect the Java code, and/or the 
Java code would benefit from the same changes being made.  Preserving 
compatibility between the two is important, and so I would like to be in 
a position to update both codebases and push the Java changes up to you, 
should that be something you would be interested in.

The general areas are:

* Improving the transforms
   - Xalan is being retired by Apache 
(https://marc.info/?l=xalan-dev&m=154450606602126&w=2) after many years 
of neglect
   - Our model transforms are Xalan-specific and only work with this 
single XSLT implementation (they are actually buggy with respect to 
namespace transformation, and likely other features as well)
   - Other C++ and Java implementations (Qt XmlPatterns, Java Saxon-HE) 
provide XSLT 2.0+ and would require the transforms rewriting in part or 
full to work with them.  Even though they can fall back to XSLT 1.0, our 
transforms are too buggy and implementation-specific to be useful 
without being rewritten.  The rewritten transforms using XSLT 2.0 
features should be rather simpler because namespace transforms (for 
example) are supported directly by the language.

* Alternative OME Model serialisation backends
   - Currently limited to Xerces XML DOM, hardcoded in the model objects
   - Allow use of C++ Qt Xml DOM to replace Apache Xerces-C++ DOM
   - Allow use of SAX for efficiency
   - Allow use of SQLite, or other database, to improve scalability 
problems with memory utilisation (or even OMERO directly)

Not all these backends are intended to be implemented right now, they 
are just examples.  The main requirement I have for OME Files is to be 
able to drop Xerces and Xalan entirely, in favour of alternative XML and 
XSLT implementations which are properly supported and deployable on 
contemporary platforms.  The candidates I have right now are the Qt5 Xml 
and XmlPatterns libraries which are mostly drop-in replacements for 
Xerces and Xalan, respectively (bar the incompatible transforms noted 
above).  I can certainly begin by using #ifdefs to select an 
implementation, but refactoring would open up some very interesting 
possibilities for both the C++ and Java codebases, without compromising 
compatibility.

In order to support multiple backends, it would be useful to split up 
ome-xml into separate libraries and jars (example names, to illustrate 
the point):

   ome-model-objects (all model objects, no serialisation logic)
   ome-model-backend-xml-dom (existing XML DOM serialisation logic)
   ome-model-backend-qtxml-dom (putative Qt Xml DOM serialisation logic)
   ome-model-backend-sqlite (putative SQLite serialisation logic)
   ome-xml (same as existing library, using the base classes and 
serialisation logic from the first two two libraries)

This would permit the development of alternative backends while still 
keeping the ome-xml library around without any compatibility break 
required.  You would be able to switch from using ome-xml to using the 
lower-level components directly for more flexibility in the choice of 
serialisation backend.  With appropriate use of annotations, you might 
(for example) be able to use SQLite as an even faster replacement for 
memoisation; opening an image in Bio-Formats could avoid nearly all the 
[de]serialisation overhead entirely, making initialisation both minimal 
cost and constant-time.

It would also be advantageous to split up the specification component 
(again, example names to illustrate the point):

   ome-schemas (XSL schema data)
   ome-ontology-owl (OWL ontology data)
   ome-transforms-xslt1 (existing XSLT 1.0 transforms)
   ome-transforms-xslt2 (putative XSLT 2.0 transforms)
   ome-transforms-xxx (putative non-XSL transforms, e.g. SQLite schema 
upgrade)
   ome-model-mock-objects [XMLMockObjects]
   ome-schema-validator [OmeValidator]
   ome-schema-resolver [SchemaResolver]
   specification (containing all of the previous)

This would allow the broken inverted dependency between ome-xml and 
specification to be circumvented, by having the individual split out 
libraries use the correct dependencies with the Java code being taken 
out of the picture.  For example, Bio-Formats code using the mock 
objects would no longer need to depend upon the specification, and code 
relying on the schemas wouldn't need to drag in all of the mock objects, 
validator and ome-xml code.  Because specification can continue to 
assemble all this content in a single jar, we would again avoid any 
compatibility break.

Would you be interested in having a chat sometime to discuss these 
points?  I can write up the proposed changes in more detail, either as 
GitHub issues or design proposal PRs.

Kind regards,
Roger