[ome-devel] a report on query optimization efforts for remote clients

Mon Apr 4 19:21:26 BST 2005

Following up on some earlier notes, I have some additional results to 
report regarding optimization efforts for remote clients..

After the Shoola meeting, we tried a few approaches aimed at populating 
large/complex DTOs. The first was the "single/query" method - a 
complicated DTO tree was replaced with a single query that included all 
of the joins. In theory, this would replace many query calls with a 
single call, which would provide a table of results that could be 
quickly parsed into the appropriate DTO hierarchy (a hash of hashes)

In practice, this did not work out so well. The complexity of the join  
led to a large table . For chain retrieval (12 chains, 187 nodes, 336 
links, 124 inputs,  212 outputs, + semantic types) led to a table that 
had almost 10K rows. By the time the required processing was finished, 
this approach was slower than the original.

To get around this, we tried a modified one class/query approach. this 
involves constructed multple queries and stitching together results as 
needed. The general strategy was to put has-one joins into a single 
query, and to have multiple queries for each side of has-many or 
many-to-many joins.  This speeds things up significantly. - from ~ 14 
sec to ~ 8.5 sec round trip to a java client (via ome-java - client and 
server on my laptop).

Ilya then took a look at the SOAP::Lite serializer and concluded that 
we could get further improvement by writing custom, ad hoc XML encoding 
for the query. Despite protestations that Chris and I made at the 
Baltimore meeting, it turns out to be fairly easy to handle this. To 
accomplish this, Ilya modified the SOAP::Lite handlers to pass 
appropriately blessed objects through unchanged.   I then modified the 
code to spit out XML , instead of a hash of hashes.  This cut the 
round-trip execution time down to somewhere in the neighborhood of 3.7 
secs.

These numbers obviously shouldn't be taken too as gospel - I haven't 
done thorough benchmarking, but I'm fairly convinced that this approach 
can lead to a substantial speed-up for large queries.

I'm going to make this code available in a branch on CVS - details 
forthcoming. Ilya is going to investigate whether or not this can be 
generalized  to easily support other queries. Even if it generalization 
is not possible, special purpose code for expensive queries is not such 
a bad thing if it generates this level of speed -up.

-harry