[ome-devel] Explorations of alternative architectures for remote clients

Mon Aug 15 13:36:05 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ok one and all. Back from baby break. (Hence the many emails) Let's take
a stab at this one.

Harry Hochheiser wrote:
> There are several advantages to this approach:

Mixed-in are the disadvantages.

> - It eliminates the need us to do our own management of serialization
> and deserialization. All of the OME-Java code and the corresponding
> XML-RPC Perl code on the server can be eliminated, in favor of
> open-source components (tomcat and spring) that can be used right out of
> the box.

Remoting in Spring is really damn simple. Heart-thumping, breath-taking
simple.

<bean 		
     name="..."
    class="..."
>
   <property name="service" ref="hierarchyBrowsingService"/>
   <property name="serviceInterface" value="..."/>
</bean>

<bean
   id="..."  class="..."
>
  <property name="serviceUrl" value="URL"/>
  <property name="serviceInterface" value="..."/>
  <property name="username" value="${user}"/>
  <property name="password" value="${pass}"/>
</bean>

Finished.

And it's fast. The problem is interoperability. 3 of the 5 or so
remoting frameworks are Java only. The other two (both from caucho.com)
 have implementations in C#, ruby, python, and C++. However, they're not
nearly as fully developed as the Java version.  Chris complained lots
about the C# version. Just ask. (though they just released a new version
last week)

*Note: this is remoting, not REST/SOAP. If you want those, I got 'em. We
just got to decide on the exchange formats.

> - it's straightforward: the same classes and interfaces can be used both
> on the client and the server.

True. Very true. What this gets you in the Java world is plug-and-play
goodness. The normal setup is running a client which uses one of the
remoting frameworks above to contact the server. If, however, you have
JDBC access (i.e. you are a postgres user) then you can just use the
server implmentation of the interface locally. Voila, speed up!

> - It's simple: we get these advantages without having to buy into all of
> the complexity and overhead of an application server.

> - It's flexible: working with tomcat and java provides us with a path
> for doing web development in JSPs and servlets, and generation of
> interfaces for SOAP is straightforward.

Um. It is simple. We may want an application server at some point,
though, for some of the features it offers. Clustering immediately comes
to mind. But for now, tomcat, jetty, resin (various servlet containers)
all suffice to run something like this.

> - Using Spring's "dependency injection"  (essentially extensive support
> for run-time configurability), I can switch from running the code over
> the network to a direct JDBC connection with one line of XML. This
> significantly speeds testing and development (Thanks to Josh for
> pointing me in this direction).

as above.

> - Perhaps most importantly, it's _fast_. for relatively small queries,
> and with all code running on my laptop, i'm seeing an order of magnitude
> speed improvement over the OME-JAVA +XML-RPC + Perl back-end that we
> currently use.

I haven't done full scale testing, but increases are between 1 and in
some cases 3 orders of magnitude. (This may be due to my running testing
scenarios for hours on end and the Apache memory becoming bloated. But
hey, that's a benefit, too.)

> there are also a few downsides:

And mixed in, the solutions (Harry's issues stem from using JDBC rather
than trying to wade into Hibernate, since he knew I was doing that.)

> - It still requires specification of queries and construction of DTO
> trees. Currently, much of this must be done by hand.  This requires some
> work, but (in my opinion) not much more than is required when a new DTO
> request is added to Shoola.

What you get with Hibernate is a model (essentially DTOs but with more
methods) generated from the database and an API which lets you search
and save those Java objects. Don't really have to write queries unless
you want to optimize (which we will). But by and large there's just very
little code to write.

> - The query/DTO construction code is exclusively on the back-end. In
> Shoola, a criteria query can be constructed on the client side and then
> sent to the server for processing. My approach doesn't currently handle
> this, but this is a classic flexibility vs. performance trade-off. The
> generality of the Perl Facades for processing arbitrary criteria for
> DTOs made it _slow_.

This is our "canned-query" discussion. Hibernate offers "named queries"
which are stored on the server. It would be possible to open an API to
create named queries. But how often would we use this? And is the
possibility of XS attacks worth it?

Named queries could be added by an admin at anytime, or on a large scale
at any release. This would probably suffice.

> - nagging questions like Object uniqueness and client-side caching are
> left unresolved.

Ditto. Within a single call to hibernate, the referential integrity of
the object graph is ensured (two datasets with the same id are the same
object). Between two calls, you're on your own. This is something we
would need to design for our system. Perhaps LSID based.

> Of course, moving to an architecture like this would be a big step. M
> ore evaluation and planning would be needed before we could move ome
> clients to a platform like this, and there may be good reasons for not
> doing so.

> we'd need to understand some more of the issues (particularly the role
> of Hibernate), and potential costs in more detail. Plus, we'd  need to
> find some way to migrate existing Shoola apps to a new data
> architecture. However, if we can eliminate literally 1000s of lines of
> plumbing code while improving data transfer performance by a factor of
> 10, we'll be able to achieve qualitatively different levels of output
> the OME user agents. In the spirit of pushing this process along
> I'd like to invite discussion, commentar, criticism, and hopefully
> enthusiasm for taking the next step.

Agreed. In the Java world, it's pretty straightforward. I've converted
Shoola calls from using OME-JAVA to using my code with no problems. The
one extensive test I ran showed a speed-up factor of 36.

If we are not immediately concerned with the .Net world (or letting
Chris play in ruby :) ) which amounts to the "helping ourselves first"
argument, then the real concern here is OME::Web*, OME::Anyalysis* ,
OME::Import*

Obviously, choice one is duplicating code. One Perl server code base.
One Java server code base. This was the first phase that we discussed in
Baltimore, and it's essentially what I've been up to.

The second phase is rough. It's also something I've put a considerable
amount of thought into. It allows the perl code to call the java server,
to save lines of code, as Harry mentioned, and for a (possible)
performance gain. I say "possible" because all of the perl stuff is in
process and I just don't know how things are going to fall out after the
networking issues.

On the one hand, with very little work we can get a _very_ large object
graph, while hitting the db just once.

On the other hand, we have to get that graph into perl. The solution we
discussed in Baltimore was SOAP. Certainly an option, but that's been
less than good to us. Over the weekend, I looked into Inline::Java . I
still need to do performance testing, but it's going to be faster.

I'm not sure of the other solutions out there. (Guys?) But I do know
that for Shoola, a system like this would make a world of difference.
(In fact, it would most certainly have kept EMBL from saying "it's too
slow")

What are the other hidden costs? Porting & testing,testing,testing (as
ever.)

Fin.

Sorry my response was long in the coming, Harry. Hope you understand.
Best wishes,
  Josh.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDAIwzIwpkR5bKmAsRAkIbAJ9kqXglHDkR+/ah0RuuuNtdAIqbwACguqVN
ZwQBwwQREKAjQPfuzCVmxAs=
=yc/7
-----END PGP SIGNATURE-----