[ome-devel] Shoola-back-end consistency issues

Mon Jul 11 14:25:34 BST 2005

Josiah:

Thanks for the note.  I've got a few comments below, but first, some 
general observations..

It seems to me that something like this might work. The idea of a 
fully-generalized versioning system for every object in the database is 
pretty intriguing - I'm sure we could do lots with it.. For example, 
using such information to provide views of the evolution of a 
characterization of some data could be pretty interesting.

That said, this proposal is a fairly big change to the data 
structure/model, and not one that should be undertaken lightly. In 
terms of the specific issue at hand, I can't help but wonder if there 
would be some better way to handle consistency issues.

As I see it, the main challenge of client consistency is that we want 
the client to have a reasonably efficient way of knowing that relevant 
data has changed. By "reasonably efficient", I mean that the client 
should be able to get this info. without having to expend lots of 
bandwidth, memory, or processing time. The naive approach of 
re-requesting the data of interest is not efficient, as much of what is 
re-requested will often be unchanged.

More efficient approaches will require _some_ state on  the back end 
that can be used to identify exactly the information that has changed, 
allowing only the appropriate deltas to be returned to the client.  
This state can be explicitly in the database (as in your proposal), or 
it can - perhaps more simply - be stored in the "business logic". that 
rides on  top of the database - DBObject, Factory, etc.

The exact approach that we take should be determined by the problems 
that we're solving. If we're interested in a full-blown versioning 
scheme, something like this approach is probably necessary, If we're 
only trying to support consistency, some state in the client software 
might do the job more easily.

The simpler problem of client consistency has another issue associated 
with it. It feels very much to me like a problem that's probably been 
solved elsewhere. My understanding is that O/R mapping tools generally 
provide facilities for maintaining the state necessary for this sort of 
consistency checking.  To the extent that this is the case, I tend to 
think that this is another argument for moving towards those tools and 
away from our own custom solutions.

As that's obviously a whole can of worms that won't be addressed 
completely any time soon, I'm going to focus on figuring out how agents 
can be constructed to support naive consistency refreshes..

specific comments below..

On Jul 8, 2005, at 10:47 AM, Josiah Johnston wrote:

> Sorry for jumping in late. I've spent the week recovering from some 
> spoiled eggs. I caught up on reading emails yesterday, but it took me 
> a minute to come up with something worth saying.
>
> Here's a proposed solution to versioning:
>
> *Everything* gets a SOURCE column. This is a reference to a table that 
> has MEX + Formal Output.

Josiah - this sounds like some things that came up in some discussions 
of the "multiple outputs with the same ST" problem. It seems to me that 
any modification like this should be designed to address the multiple 
output problem.

> Everything also gets a VERSION column.
> A separate version table is made for every object type (to avoid super 
> long, skinny tables that have low performance).

Have we actually verified that long, skinny tables have low performance?

> Each version table has a version number, object id, and SOURCE id.

If the version has a source id tied to it, why does the object need a 
source ?

> Every operation gets a MEX. Every MEX is reversible.
> Anytime something that's part of an object definition changes, its 
> version increments.
> A version table ties an object version number to a MEX.
>
ok.

> With a setup like that, it is straightforward to pull out any version 
> of an object from the DB. It's also straightforward to report the 
> current object's version and give a diff to a previous version.
>
> USE CASE
>
> Shoola's zoomable dataset browser is open to a dataset. In a separate 
> agent, thousands of images are added to that dataset. The user hits 
> "refresh".
> When the images were added, all the new entries to image_dataset_map 
> got a single SOURCE id. This changed the 'images' relation that is 
> defined in OME::Dataset. A new entry was added DATASET_VERSION with 
> the new version number, the DATASET id, and the SOURCE id. That 
> dataset's version was updated.
> When the user hits "refresh", shoola asks the DB for the current 
> version of that dataset and realizes it doesn't match it's local one. 
> A dumb agent can ask to reload the new one. A smart agent knows what 
> parts of the dataset it's displaying (images in the dataset). The 
> smart agent asks for a diff, examines the response, realizes the 
> changes made to the dataset affect the display, and updates 
> accordingly. If the changes had been to chain executions, the agent 
> would have had less work to do.
>

Fair enough.

> This solution leaves one problem unaddressed. How do we represent 
> "edit" actions? An example of an edit is changing the description of a 
> dataset. Another example is removing a classification. In the 
> abstract, all actions can be thought of as adding to a graph, editing 
> nodes or edges in a graph, or deleting from a graph. Our versioning 
> model should accommodate all of these. The model and use case I laid 
> out above are adding to a graph.
> My best idea for Edit operations is to make a copy of the old version. 
> This avoids problems with updating references. The copy needs to be 
> clearly flagged as being an old version. One way of doing that is 
> storing it in separate tables (e.g. OLD_DATASETS or 
> OLD_CLASSIFICATIONS). I think it's a relatively minor change to point 
> a DBObject at a different table. To get the old version of an object 
> that has been edited, you load the current object, load the archived 
> old version, and copy all the fields (except id) from the old to the 
> new.
>
> However, EDIT is largely a separable issue, isn't the current subject 
> of conversation, and we don't deal with it & provenance well in our 
> current software. I believe the model I proposed can be used for graph 
> additions, and later editing can be added on. So, I won't elaborate on 
> that subject further right now.
>

Yes, these are tricky issues indeed. I think we've got to tread 
carefully if we decide to go down this path...

harry