[ome-devel] a report on query optimization efforts for remote
clients
Harry Hochheiser
hsh at nih.gov
Mon Apr 4 19:21:26 BST 2005
Following up on some earlier notes, I have some additional results to
report regarding optimization efforts for remote clients..
After the Shoola meeting, we tried a few approaches aimed at populating
large/complex DTOs. The first was the "single/query" method - a
complicated DTO tree was replaced with a single query that included all
of the joins. In theory, this would replace many query calls with a
single call, which would provide a table of results that could be
quickly parsed into the appropriate DTO hierarchy (a hash of hashes)
In practice, this did not work out so well. The complexity of the join
led to a large table . For chain retrieval (12 chains, 187 nodes, 336
links, 124 inputs, 212 outputs, + semantic types) led to a table that
had almost 10K rows. By the time the required processing was finished,
this approach was slower than the original.
To get around this, we tried a modified one class/query approach. this
involves constructed multple queries and stitching together results as
needed. The general strategy was to put has-one joins into a single
query, and to have multiple queries for each side of has-many or
many-to-many joins. This speeds things up significantly. - from ~ 14
sec to ~ 8.5 sec round trip to a java client (via ome-java - client and
server on my laptop).
Ilya then took a look at the SOAP::Lite serializer and concluded that
we could get further improvement by writing custom, ad hoc XML encoding
for the query. Despite protestations that Chris and I made at the
Baltimore meeting, it turns out to be fairly easy to handle this. To
accomplish this, Ilya modified the SOAP::Lite handlers to pass
appropriately blessed objects through unchanged. I then modified the
code to spit out XML , instead of a hash of hashes. This cut the
round-trip execution time down to somewhere in the neighborhood of 3.7
secs.
These numbers obviously shouldn't be taken too as gospel - I haven't
done thorough benchmarking, but I'm fairly convinced that this approach
can lead to a substantial speed-up for large queries.
I'm going to make this code available in a branch on CVS - details
forthcoming. Ilya is going to investigate whether or not this can be
generalized to easily support other queries. Even if it generalization
is not possible, special purpose code for expensive queries is not such
a bad thing if it generates this level of speed -up.
-harry
More information about the ome-devel
mailing list