[ome-devel] Shoola-back-end consistency issues
Harry Hochheiser
hsh at nih.gov
Mon Jul 11 14:25:34 BST 2005
Josiah:
Thanks for the note. I've got a few comments below, but first, some
general observations..
It seems to me that something like this might work. The idea of a
fully-generalized versioning system for every object in the database is
pretty intriguing - I'm sure we could do lots with it.. For example,
using such information to provide views of the evolution of a
characterization of some data could be pretty interesting.
That said, this proposal is a fairly big change to the data
structure/model, and not one that should be undertaken lightly. In
terms of the specific issue at hand, I can't help but wonder if there
would be some better way to handle consistency issues.
As I see it, the main challenge of client consistency is that we want
the client to have a reasonably efficient way of knowing that relevant
data has changed. By "reasonably efficient", I mean that the client
should be able to get this info. without having to expend lots of
bandwidth, memory, or processing time. The naive approach of
re-requesting the data of interest is not efficient, as much of what is
re-requested will often be unchanged.
More efficient approaches will require _some_ state on the back end
that can be used to identify exactly the information that has changed,
allowing only the appropriate deltas to be returned to the client.
This state can be explicitly in the database (as in your proposal), or
it can - perhaps more simply - be stored in the "business logic". that
rides on top of the database - DBObject, Factory, etc.
The exact approach that we take should be determined by the problems
that we're solving. If we're interested in a full-blown versioning
scheme, something like this approach is probably necessary, If we're
only trying to support consistency, some state in the client software
might do the job more easily.
The simpler problem of client consistency has another issue associated
with it. It feels very much to me like a problem that's probably been
solved elsewhere. My understanding is that O/R mapping tools generally
provide facilities for maintaining the state necessary for this sort of
consistency checking. To the extent that this is the case, I tend to
think that this is another argument for moving towards those tools and
away from our own custom solutions.
As that's obviously a whole can of worms that won't be addressed
completely any time soon, I'm going to focus on figuring out how agents
can be constructed to support naive consistency refreshes..
specific comments below..
On Jul 8, 2005, at 10:47 AM, Josiah Johnston wrote:
> Sorry for jumping in late. I've spent the week recovering from some
> spoiled eggs. I caught up on reading emails yesterday, but it took me
> a minute to come up with something worth saying.
>
> Here's a proposed solution to versioning:
>
> *Everything* gets a SOURCE column. This is a reference to a table that
> has MEX + Formal Output.
Josiah - this sounds like some things that came up in some discussions
of the "multiple outputs with the same ST" problem. It seems to me that
any modification like this should be designed to address the multiple
output problem.
> Everything also gets a VERSION column.
> A separate version table is made for every object type (to avoid super
> long, skinny tables that have low performance).
Have we actually verified that long, skinny tables have low performance?
> Each version table has a version number, object id, and SOURCE id.
If the version has a source id tied to it, why does the object need a
source ?
> Every operation gets a MEX. Every MEX is reversible.
> Anytime something that's part of an object definition changes, its
> version increments.
> A version table ties an object version number to a MEX.
>
ok.
> With a setup like that, it is straightforward to pull out any version
> of an object from the DB. It's also straightforward to report the
> current object's version and give a diff to a previous version.
>
> USE CASE
>
> Shoola's zoomable dataset browser is open to a dataset. In a separate
> agent, thousands of images are added to that dataset. The user hits
> "refresh".
> When the images were added, all the new entries to image_dataset_map
> got a single SOURCE id. This changed the 'images' relation that is
> defined in OME::Dataset. A new entry was added DATASET_VERSION with
> the new version number, the DATASET id, and the SOURCE id. That
> dataset's version was updated.
> When the user hits "refresh", shoola asks the DB for the current
> version of that dataset and realizes it doesn't match it's local one.
> A dumb agent can ask to reload the new one. A smart agent knows what
> parts of the dataset it's displaying (images in the dataset). The
> smart agent asks for a diff, examines the response, realizes the
> changes made to the dataset affect the display, and updates
> accordingly. If the changes had been to chain executions, the agent
> would have had less work to do.
>
Fair enough.
> This solution leaves one problem unaddressed. How do we represent
> "edit" actions? An example of an edit is changing the description of a
> dataset. Another example is removing a classification. In the
> abstract, all actions can be thought of as adding to a graph, editing
> nodes or edges in a graph, or deleting from a graph. Our versioning
> model should accommodate all of these. The model and use case I laid
> out above are adding to a graph.
> My best idea for Edit operations is to make a copy of the old version.
> This avoids problems with updating references. The copy needs to be
> clearly flagged as being an old version. One way of doing that is
> storing it in separate tables (e.g. OLD_DATASETS or
> OLD_CLASSIFICATIONS). I think it's a relatively minor change to point
> a DBObject at a different table. To get the old version of an object
> that has been edited, you load the current object, load the archived
> old version, and copy all the fields (except id) from the old to the
> new.
>
> However, EDIT is largely a separable issue, isn't the current subject
> of conversation, and we don't deal with it & provenance well in our
> current software. I believe the model I proposed can be used for graph
> additions, and later editing can be added on. So, I won't elaborate on
> that subject further right now.
>
Yes, these are tricky issues indeed. I think we've got to tread
carefully if we decide to go down this path...
harry
More information about the ome-devel
mailing list