[ome-devel] Discussion of future ome-model work

Mon Dec 17 20:20:05 GMT 2018

On Wed, Dec 12, 2018 at 4:43 PM Roger Leigh <rleigh at codelibre.net> wrote:
>
> Dear all,

Hi Roger,

> In the course of working on OME Files, I've encountered a few issues
> which require addressing in order to make it suitable for wider
> deployment.  Some of these issues also affect the Java code, and/or the
> Java code would benefit from the same changes being made.  Preserving
> compatibility between the two is important, and so I would like to be in
> a position to update both codebases and push the Java changes up to you,
> should that be something you would be interested in.

Thanks for the email and the suggestions.  Obviously, there’s a lot
here, and in an optimal case, we’d like to see as much of this happen
as possible, but for now, let’s see if we can piece apart the most
immediate wins:

> The general areas are:
>
> * Improving the transforms
>    - Xalan is being retired by Apache
> (https://marc.info/?l=xalan-dev&m=154450606602126&w=2) after many years
> of neglect
>    - Our model transforms are Xalan-specific and only work with this
> single XSLT implementation (they are actually buggy with respect to
> namespace transformation, and likely other features as well)
>    - Other C++ and Java implementations (Qt XmlPatterns, Java Saxon-HE)
> provide XSLT 2.0+ and would require the transforms rewriting in part or
> full to work with them.  Even though they can fall back to XSLT 1.0, our
> transforms are too buggy and implementation-specific to be useful
> without being rewritten.  The rewritten transforms using XSLT 2.0
> features should be rather simpler because namespace transforms (for
> example) are supported directly by the language.

Which transforms specifically are incompatible with the new library? A
rewrite of all the XSLTs would be a significant undertaking. And for
some time, we may even need to support both the new and old
transforms. Could you see multiple transform directories working?

(Also, what specific bugs are you seeing in the XSLTs? Do these need
writing up as separate issues or do you see it more as part of a
design issue?)

> * Alternative OME Model serialisation backends
>    - Currently limited to Xerces XML DOM, hardcoded in the model objects
>    - Allow use of C++ Qt Xml DOM to replace Apache Xerces-C++ DOM
>    - Allow use of SAX for efficiency
>    - Allow use of SQLite, or other database, to improve scalability
> problems with memory utilisation (or even OMERO directly)
>
> Not all these backends are intended to be implemented right now, they
> are just examples.  The main requirement I have for OME Files is to be
> able to drop Xerces and Xalan entirely, in favour of alternative XML and
> XSLT implementations which are properly supported and deployable on
> contemporary platforms.  The candidates I have right now are the Qt5 Xml
> and XmlPatterns libraries which are mostly drop-in replacements for
> Xerces and Xalan, respectively (bar the incompatible transforms noted
> above).  I can certainly begin by using #ifdefs to select an
> implementation, but refactoring would open up some very interesting
> possibilities for both the C++ and Java codebases, without compromising
> compatibility.
>
> In order to support multiple backends, it would be useful to split up
> ome-xml into separate libraries and jars (example names, to illustrate
> the point):
>
>    ome-model-objects (all model objects, no serialisation logic)
>    ome-model-backend-xml-dom (existing XML DOM serialisation logic)
>    ome-model-backend-qtxml-dom (putative Qt Xml DOM serialisation logic)
>    ome-model-backend-sqlite (putative SQLite serialisation logic)
>    ome-xml (same as existing library, using the base classes and
> serialisation logic from the first two two libraries)
>
> This would permit the development of alternative backends while still
> keeping the ome-xml library around without any compatibility break
> required.  You would be able to switch from using ome-xml to using the
> lower-level components directly for more flexibility in the choice of
> serialisation backend.  With appropriate use of annotations, you might
> (for example) be able to use SQLite as an even faster replacement for
> memoisation; opening an image in Bio-Formats could avoid nearly all the
> [de]serialisation overhead entirely, making initialisation both minimal
> cost and constant-time.
>
> It would also be advantageous to split up the specification component
> (again, example names to illustrate the point):
>
>    ome-schemas (XSL schema data)
>    ome-ontology-owl (OWL ontology data)
>    ome-transforms-xslt1 (existing XSLT 1.0 transforms)
>    ome-transforms-xslt2 (putative XSLT 2.0 transforms)
>    ome-transforms-xxx (putative non-XSL transforms, e.g. SQLite schema
> upgrade)
>    ome-model-mock-objects [XMLMockObjects]
>    ome-schema-validator [OmeValidator]
>    ome-schema-resolver [SchemaResolver]
>    specification (containing all of the previous)
>
> This would allow the broken inverted dependency between ome-xml and
> specification to be circumvented, by having the individual split out
> libraries use the correct dependencies with the Java code being taken
> out of the picture.  For example, Bio-Formats code using the mock
> objects would no longer need to depend upon the specification, and code
> relying on the schemas wouldn't need to drag in all of the mock objects,
> validator and ome-xml code.  Because specification can continue to
> assemble all this content in a single jar, we would again avoid any
> compatibility break.

How much of the new backend work can be performed without needing to
split all the repositories? We are concerned that these changes could
cause significant instabilities, and as happens, our priorities are
elsewhere-- new binary vessels, publication of OME OWL, updates to
OMERO and IDR.

Can we perhaps start with the prototype of a single second backend
before causing too much churn?

> Would you be interested in having a chat sometime to discuss these
> points?  I can write up the proposed changes in more detail, either as
> GitHub issues or design proposal PRs.

As you did with the OME-TIFF pyramidal process, could you write up two
design issues -- re: v2 of the XSLT transforms and a prototype model
serialization backend -- including intermediate steps and we can
iterate there?

> Kind regards,
> Roger

All the best,
~Josh