[ome-devel] Shoola-back-end consistency issues

Wed Jul 6 15:36:00 BST 2005

Josh recently posted a bugzilla report complaining that the zoomable 
dataset browser does not update to reflect changed conditions in the 
database - i.e., if the zoomable browser is started and an image is 
added to the dataset, the zoomable browser will not be updated - even 
if it is restarted.

I replied that this was indeed the case - I have not even tried to 
achieve such consistency. I even have an "optimization" that 
essentially loads the data in the zoomable browser exactly once in 
instance of Shoola. Thus, restarting the zoomable browser won't lead to 
an update. Suboptimal, but this approach _does_ save processing if the 
datasets have indeed not changed.

So, this leads to the more general question: how can we build Shoola 
agents that maintain state that is consistent with the database, & 
therefore leads to showing data items consistently?

The more I think about it, the harder this problem seems.  Shoola & 
OME-JAVA are essentially stateless.  Agents request data and get back 
chunks of it.   There is no record of the state of any request on the 
back-end, and no central manager in Shoola or OME-JAVA. Thus, for an 
agent to verify that data is consistent, it must do all of the work on 
its own. There are several solutions, each of which has some 
difficulty:

1) Do nothing - give up on consistency. Easy, but obviously not ideal.

2) Delay retrieval. This is currently done on the data manager. The 
list of images in a dataset is not pre-loaded: instead, it is retrieved 
when the icon for the dataset is expanded.  This works somewhat, but is 
incomplete: what if images are added while the icon is expanded?

3) Refresh: The data manager handles this by providing  "refresh" 
buttons.  These buttons work somewhat, but they are less than ideal. A  
naive refresh operation that simply repeats a request will be painfully 
inefficient when one image is added to a dataset of 10K images, but 
anything more sophisticated will be tricky (more on this in a bit).

4) Refresh might be improved by providing some information that clients 
could use to decide when to reload data. Checking to see if the size of 
the dataset has changed might indicate _if_ a change has been made, but 
mechanisms for indicating _what_ has been changed would be needed. The 
client might send back its list of images in the dataset, and get back 
lists of images that had been added & deleted, but this not be terribly 
efficient either.  I think such information would be necessary for this 
approach to be of use: what if  1 image is added and one deleted? 
Simply checking changes in the size of a dataset would not catch this 
case.

5) Josh suggested checking MEXs to look for MEX timestamps that are 
newer than the most recent one, indicating a change to the dataset. 
Unfortunately, changes to dataset and project contents do not appear to 
generate MEXs. Any idea why this is? Shouldn't  these actions have 
MEXes?

6) We might add version/timestamp info to each row in tables like 
datasets, projects, etc. This is too painful a thought for me to 
contemplate at the moment.

7) Another painful idea would involve building some data managers on 
the client and back-end that would manage state. Perhaps by passing a 
token around for different requests, these managers could provide a 
generally useful and yet reasonably performant consistency mechanism. 
Not having thought through the design, I'm not sure how this would 
work, but I'd bet it could be done reasonably cleanly.

Thoughts/ideas/responses? Unless anyone has any better ideas that are 
not too painful, I think it's safe to say that some combination of 1,2, 
& 3 will continue to be the status quo.

-harry