[ome-devel] Shoola-back-end consistency issues

Josiah Johnston siah at nih.gov
Fri Jul 8 15:47:10 BST 2005


Sorry for jumping in late. I've spent the week recovering from some 
spoiled eggs. I caught up on reading emails yesterday, but it took me a 
minute to come up with something worth saying.

Here's a proposed solution to versioning:

*Everything* gets a SOURCE column. This is a reference to a table that 
has MEX + Formal Output.
Everything also gets a VERSION column.
A separate version table is made for every object type (to avoid super 
long, skinny tables that have low performance).
Each version table has a version number, object id, and SOURCE id.
Every operation gets a MEX. Every MEX is reversible.
Anytime something that's part of an object definition changes, its 
version increments.
A version table ties an object version number to a MEX.

With a setup like that, it is straightforward to pull out any version 
of an object from the DB. It's also straightforward to report the 
current object's version and give a diff to a previous version.

USE CASE

Shoola's zoomable dataset browser is open to a dataset. In a separate 
agent, thousands of images are added to that dataset. The user hits 
"refresh".
When the images were added, all the new entries to image_dataset_map 
got a single SOURCE id. This changed the 'images' relation that is 
defined in OME::Dataset. A new entry was added DATASET_VERSION with the 
new version number, the DATASET id, and the SOURCE id. That dataset's 
version was updated.
When the user hits "refresh", shoola asks the DB for the current 
version of that dataset and realizes it doesn't match it's local one. A 
dumb agent can ask to reload the new one. A smart agent knows what 
parts of the dataset it's displaying (images in the dataset). The smart 
agent asks for a diff, examines the response, realizes the changes made 
to the dataset affect the display, and updates accordingly. If the 
changes had been to chain executions, the agent would have had less 
work to do.

This solution leaves one problem unaddressed. How do we represent 
"edit" actions? An example of an edit is changing the description of a 
dataset. Another example is removing a classification. In the abstract, 
all actions can be thought of as adding to a graph, editing nodes or 
edges in a graph, or deleting from a graph. Our versioning model should 
accommodate all of these. The model and use case I laid out above are 
adding to a graph.
My best idea for Edit operations is to make a copy of the old version. 
This avoids problems with updating references. The copy needs to be 
clearly flagged as being an old version. One way of doing that is 
storing it in separate tables (e.g. OLD_DATASETS or 
OLD_CLASSIFICATIONS). I think it's a relatively minor change to point a 
DBObject at a different table. To get the old version of an object that 
has been edited, you load the current object, load the archived old 
version, and copy all the fields (except id) from the old to the new.

However, EDIT is largely a separable issue, isn't the current subject 
of conversation, and we don't deal with it & provenance well in our 
current software. I believe the model I proposed can be used for graph 
additions, and later editing can be added on. So, I won't elaborate on 
that subject further right now.

-Josiah



More information about the ome-devel mailing list