[ome-devel] Shoola-back-end consistency issues
Josiah Johnston
siah at nih.gov
Fri Jul 8 15:47:10 BST 2005
Sorry for jumping in late. I've spent the week recovering from some
spoiled eggs. I caught up on reading emails yesterday, but it took me a
minute to come up with something worth saying.
Here's a proposed solution to versioning:
*Everything* gets a SOURCE column. This is a reference to a table that
has MEX + Formal Output.
Everything also gets a VERSION column.
A separate version table is made for every object type (to avoid super
long, skinny tables that have low performance).
Each version table has a version number, object id, and SOURCE id.
Every operation gets a MEX. Every MEX is reversible.
Anytime something that's part of an object definition changes, its
version increments.
A version table ties an object version number to a MEX.
With a setup like that, it is straightforward to pull out any version
of an object from the DB. It's also straightforward to report the
current object's version and give a diff to a previous version.
USE CASE
Shoola's zoomable dataset browser is open to a dataset. In a separate
agent, thousands of images are added to that dataset. The user hits
"refresh".
When the images were added, all the new entries to image_dataset_map
got a single SOURCE id. This changed the 'images' relation that is
defined in OME::Dataset. A new entry was added DATASET_VERSION with the
new version number, the DATASET id, and the SOURCE id. That dataset's
version was updated.
When the user hits "refresh", shoola asks the DB for the current
version of that dataset and realizes it doesn't match it's local one. A
dumb agent can ask to reload the new one. A smart agent knows what
parts of the dataset it's displaying (images in the dataset). The smart
agent asks for a diff, examines the response, realizes the changes made
to the dataset affect the display, and updates accordingly. If the
changes had been to chain executions, the agent would have had less
work to do.
This solution leaves one problem unaddressed. How do we represent
"edit" actions? An example of an edit is changing the description of a
dataset. Another example is removing a classification. In the abstract,
all actions can be thought of as adding to a graph, editing nodes or
edges in a graph, or deleting from a graph. Our versioning model should
accommodate all of these. The model and use case I laid out above are
adding to a graph.
My best idea for Edit operations is to make a copy of the old version.
This avoids problems with updating references. The copy needs to be
clearly flagged as being an old version. One way of doing that is
storing it in separate tables (e.g. OLD_DATASETS or
OLD_CLASSIFICATIONS). I think it's a relatively minor change to point a
DBObject at a different table. To get the old version of an object that
has been edited, you load the current object, load the archived old
version, and copy all the fields (except id) from the old to the new.
However, EDIT is largely a separable issue, isn't the current subject
of conversation, and we don't deal with it & provenance well in our
current software. I believe the model I proposed can be used for graph
additions, and later editing can be added on. So, I won't elaborate on
that subject further right now.
-Josiah
More information about the ome-devel
mailing list