[ome-devel] Shoola-back-end consistency issues

Wed Jul 6 17:53:46 BST 2005

Ilya:

On Jul 6, 2005, at 12:07 PM, Ilya Goldberg wrote:
>>
>> The more I think about it, the harder this problem seems.  Shoola & 
>> OME-JAVA are essentially stateless.  Agents request data and get back 
>> chunks of it.   There is no record of the state of any request on the 
>> back-end, and no central manager in Shoola or OME-JAVA. Thus, for an 
>> agent to verify that data is consistent, it must do all of the work 
>> on its own. There are several solutions, each of which has some 
>> difficulty:
>>
>> 1) Do nothing - give up on consistency. Easy, but obviously not ideal.
>
> Unfortunately, this is not even an option.  We've fought long and hard 
> for data consistency for the last five years.  We're not giving up on 
> it right at the client.  That's pure insanity.

Fair enough. This was sort of a straw man of a suggestion. Also 
reflects the current state of the world.

>> 2) Delay retrieval. This is currently done on the data manager. The 
>> list of images in a dataset is not pre-loaded: instead, it is 
>> retrieved when the icon for the dataset is expanded.  This works 
>> somewhat, but is incomplete: what if images are added while the icon 
>> is expanded?
>>
>> 3) Refresh: The data manager handles this by providing  "refresh" 
>> buttons.  These buttons work somewhat, but they are less than ideal. 
>> A  naive refresh operation that simply repeats a request will be 
>> painfully inefficient when one image is added to a dataset of 10K 
>> images, but anything more sophisticated will be tricky (more on this 
>> in a bit).
>
> One of these two has to be implemented.  Its perfectly fine that 
> restarting an agent reloads its saved state even if its inconsistent.  
> But a separate button must be provided that ensures that the agent is 
> in a consistent state.  Either the agent is completely stateless and 
> gets its state only from the DB, or it must have a refresh button to 
> resynchronize state.  Even a stateless agent has state on the screen, 
> so even these need a button to let the user ensure that what is 
> displayed is what is in the DB.
>

I think it has to be  a combination of the two. When  appropriate, 
don't load the info until it's available. Otherwise, do a refresh.

>> 4) Refresh might be improved by providing some information that 
>> clients could use to decide when to reload data. Checking to see if 
>> the size of the dataset has changed might indicate _if_ a change has 
>> been made, but mechanisms for indicating _what_ has been changed 
>> would be needed. The client might send back its list of images in the 
>> dataset, and get back lists of images that had been added & deleted, 
>> but this not be terribly efficient either.  I think such information 
>> would be necessary for this approach to be of use: what if  1 image 
>> is added and one deleted? Simply checking changes in the size of a 
>> dataset would not catch this case.
>>
>> 5) Josh suggested checking MEXs to look for MEX timestamps that are 
>> newer than the most recent one, indicating a change to the dataset. 
>> Unfortunately, changes to dataset and project contents do not appear 
>> to generate MEXs. Any idea why this is? Shouldn't  these actions have 
>> MEXes?
>>
>> 6) We might add version/timestamp info to each row in tables like 
>> datasets, projects, etc. This is too painful a thought for me to 
>> contemplate at the moment.
>
> This is actually not that bad.  If certain aspects of the container 
> objects are modifiable without a MEX, then it would not be very 
> difficult to add a standard suite of timestamps to them (last access, 
> last modified).  We don't need versioning for this.  This can be done 
> with DB rules in postgres, so it wouldn't necessarily even require any 
> action by code to update these.  But this is an ad-hoq solution to a 
> specific problem:  A way to check the container objects for things 
> about them that can be modified without a MEX.  Adding a new attribute 
> to a dataset, for example, would mark it as "modified" just like 
> adding a new image to it would.  Or, we could add a separate last MEX 
> timestamp (though that would be redundant).

Could we change code for modifying datasets, projects, etc. to generate 
a MEX for each action? Would this do the trick?
more discussion of these alternatives may be necessary.

>>
>> 7) Another painful idea would involve building some data managers on 
>> the client and back-end that would manage state. Perhaps by passing a 
>> token around for different requests, these managers could provide a 
>> generally useful and yet reasonably performant consistency mechanism. 
>> Not having thought through the design, I'm not sure how this would 
>> work, but I'd bet it could be done reasonably cleanly.
>
> Well, the UserState object could be used for this.  It already 
> maintains a timestamp of when it was last accessed.  Hitting the 
> refresh button when the session was not accessed since the last time 
> it was refreshed wouldn't accomplish much.  Or would it?  Some other 
> user with access to the dataset could have edited its image content.  
> Remember that other users (and even the same users) have access to the 
> same DB through other means.  Consistency can't be ensured by a single 
> client, or by any token bound to a single client.

Where is this object?

the token may be necessary, but not sufficient, for consistency.  If 
the client and the back-end maintain some record of the state of a 
query, the client could send the token to the back-end. The back-end 
could then re-run the query, compare it to previous results to say what 
had changed, and send an appropriately-packaged response to the client. 
  The token is simply a means to refer to the query that is being 
specified, and results for prior invocations of that query.

>
>>
>> Thoughts/ideas/responses? Unless anyone has any better ideas that are 
>> not too painful, I think it's safe to say that some combination of 
>> 1,2, & 3 will continue to be the status quo.
>
> Agents must have a way for the user to ensure that they are 
> synchronized with the back-end.  As painful and as slow as that can 
> be, this functionality has to exist.

got it. Given the importance of this goal, we should talk concretely 
about how best to achieve it.

harry