[ome-devel] Explorations of alternative architectures for remote clients

Mon Aug 15 14:45:47 BST 2005

Josh:

> And it's fast. The problem is interoperability. 3 of the 5 or so
> remoting frameworks are Java only. The other two (both from caucho.com)
>  have implementations in C#, ruby, python, and C++. However, they're 
> not
> nearly as fully developed as the Java version.  Chris complained lots
> about the C# version. Just ask. (though they just released a new 
> version
> last week)
>
> *Note: this is remoting, not REST/SOAP. If you want those, I got 'em. 
> We
> just got to decide on the exchange formats.

Personally, I think that if we can do Java really well and really fast, 
while providing (via REST/SOAP) paths for other languages, that's 
pretty good. Holding out for something that's fast and completely 
flexible for multiple languages might not be totally realistic.

>> - It's flexible: working with tomcat and java provides us with a path
>> for doing web development in JSPs and servlets, and generation of
>> interfaces for SOAP is straightforward.
>
> Um. It is simple. We may want an application server at some point,
> though, for some of the features it offers. Clustering immediately 
> comes
> to mind. But for now, tomcat, jetty, resin (various servlet containers)
> all suffice to run something like this.
>
Right. they key here is that the use of these accepted technologies 
moves us closer to the realm where it would be possible to swap various 
components in and out as needed.
>
> .....
>
>> - The query/DTO construction code is exclusively on the back-end. In
>> Shoola, a criteria query can be constructed on the client side and 
>> then
>> sent to the server for processing. My approach doesn't currently 
>> handle
>> this, but this is a classic flexibility vs. performance trade-off. The
>> generality of the Perl Facades for processing arbitrary criteria for
>> DTOs made it _slow_.
>
> This is our "canned-query" discussion. Hibernate offers "named queries"
> which are stored on the server. It would be possible to open an API to
> create named queries. But how often would we use this? And is the
> possibility of XS attacks worth it?
>
> Named queries could be added by an admin at anytime, or on a large 
> scale
> at any release. This would probably suffice.
>

I agree. I don't see this as being too much of an issue.

>> - nagging questions like Object uniqueness and client-side caching are
>> left unresolved.
>
> Ditto. Within a single call to hibernate, the referential integrity of
> the object graph is ensured (two datasets with the same id are the same
> object). Between two calls, you're on your own. This is something we
> would need to design for our system. Perhaps LSID based.
>

I have some ideas of how we might do this, but I haven't fleshed them 
out  completely, and they wouldn't be trivial. Would moving to an 
application server help in this regard?
>
>> Of course, moving to an architecture like this would be a big step. M
>> ore evaluation and planning would be needed before we could move ome
>> clients to a platform like this, and there may be good reasons for not
>> doing so.
>
>> we'd need to understand some more of the issues (particularly the role
>> of Hibernate), and potential costs in more detail. Plus, we'd  need to
>> find some way to migrate existing Shoola apps to a new data
>> architecture. However, if we can eliminate literally 1000s of lines of
>> plumbing code while improving data transfer performance by a factor of
>> 10, we'll be able to achieve qualitatively different levels of output
>> the OME user agents. In the spirit of pushing this process along
>> I'd like to invite discussion, commentar, criticism, and hopefully
>> enthusiasm for taking the next step.
>
> Agreed. In the Java world, it's pretty straightforward. I've converted
> Shoola calls from using OME-JAVA to using my code with no problems. The
> one extensive test I ran showed a speed-up factor of 36.
>
> If we are not immediately concerned with the .Net world (or letting
> Chris play in ruby :) ) which amounts to the "helping ourselves first"
> argument, then the real concern here is OME::Web*, OME::Anyalysis* ,
> OME::Import*
>
And OME::Tasks...

> Obviously, choice one is duplicating code. One Perl server code base.
> One Java server code base. This was the first phase that we discussed 
> in
> Baltimore, and it's essentially what I've been up to.
>
> The second phase is rough. It's also something I've put a considerable
> amount of thought into. It allows the perl code to call the java 
> server,
> to save lines of code, as Harry mentioned, and for a (possible)
> performance gain. I say "possible" because all of the perl stuff is in
> process and I just don't know how things are going to fall out after 
> the
> networking issues.
> On the one hand, with very little work we can get a _very_ large object
> graph, while hitting the db just once.
>
> On the other hand, we have to get that graph into perl. The solution we
> discussed in Baltimore was SOAP. Certainly an option, but that's been
> less than good to us. Over the weekend, I looked into Inline::Java . I
> still need to do performance testing, but it's going to be faster.
>
> I'm not sure of the other solutions out there. (Guys?) But I do know
> that for Shoola, a system like this would make a world of difference.
> (In fact, it would most certainly have kept EMBL from saying "it's too
> slow")
>

I tend to think that dual code base will be the easiest and most 
maintainable thing to do for the short to medium term.  Making the Perl 
code talk to the Java is certainly a possibility.  This would have the 
advantage of eliminating things like Factory, DBObject, etc., but it 
might be tricky to tease out.

Another would be to consider a gradual migration of the back-end from  
Perl to Java.  Web functionality could be migrated to run as 
JSP/servlets, perhaps initially starting with new functionality and 
then old stuff could be moved.

That said, I wouldn't reject off-hand the idea of maintaining dual code 
bases for an arbitrary amount of time. This is probably the least 
effort and biggest bang for the buck.

Josh, the Hibernate data classes are auto-generated off of the database 
contents, right? If that is the case, the consistency problems (ie, the 
Perl and Java code getting out-of-sync) are likely to be fairly 
manageable.  I'm guessing we'd probably agree that we wouldn't choose 
to have parallel code bases, but it may be the most cost-effective 
approach.

-harry