[ome-devel] The Zen of OME: philosophical questions on data models
and functionality
Harry Hochheiser
hsh at nih.gov
Mon Dec 5 14:57:42 GMT 2005
A recent private discussion of use cases, user needs, and related
issues has reached the point where it seemed appropriate to move it
onto this list. I:'m going to try to summarize and a few points that
were raised and provide some new comments. Others, please let me know
if i've mischaracterized things.
Three main topics have been kicked around: appropriate strategies for
data modeling, the handling of binary data, and the need for lower
barriers to entry with OME, particularly with respect to analysis.
As I write this note, I realize that these topics all reflect the
ongoing challenge behind idealized design and the dirty muck that is
reality. OME was designed to have a fairly "pure" view of the world:
even though the data model is extensible, the applications, data
types, and analysis tools would be very well-defined and constructed
to adhere to OME rules that would make things work together nicely.
Unfortunately, the world is rarely so accommodating, and even less so
in research. Thus, using OME presents some practical challenges,
particularly for users who have not "drunk the kool-aid" and been
completely convinced of the appropriateness of the OME approach.
The first manifestation of this tension that I'll mention is the
question of data modeling. OME's extensible data model allows users
to create new semantic types, which can contain structured data as
inputs to or outputs from analysis. Run-time import tools can be used
to create new types when needed, making things in theory very flexible.
However, the ST model is, in some other ways, not so flexible. Once a
type has been added, it can't be changed easily. Data can be added by
defining new STs that refer to the original, but fields aren't easily
removed. In some sense, this means that STs are great for defining
models that are already clearly understood and fairly mature. If the
data modeling needs are not well-understood, defining STs may be
pre-mature.
The category group/category model is a more lightweight approach.
Having Category Groups and Categories defined as STs, a user can
create a new CategoryGroup (instance of a CategoryGroup ST) and
associate one or more Category instances with that group. These
Category instances can then be associated with images, with the
presumed interpretation that categories in a category group are
mutually exclusive.
This approach has the advantage of being drop-dead simple and, as it
does not require any ST definition, it does not lead to any
"cluttering' of the data space with STs that might need to be
deprecated. New Categories are also easy to add at any time.
(Certainly, this would also be true if a user defined one ST to act
as a "CategoryGroup" and another ST to define instances of that
group, but this approach is less lightweight).
There are, however, a couple of problems with this approach. Unlike
more general STs, categories can not have additional information
stored with them. Using STs computationally runs the risk of making
things a bit less clear and rigorous. For example, if I have a
module that outputs an ST which is "DevelopmentalAge", I can see
from the module description what it's doing. However, if I have an
output which is a "Category", I don't really know what's going on
unless I look a bit more deeply.
So, where does this leave the poor developer or user? STs are good,
but require fully-defined, well-understood uses? Category Groups are
appealing and simple, but limited.
My take is that users should start with Category Groups and
Categories where they work, and move to STs when they can: i.e., when
data models are clearly understood.
How to make this distinction? Who Knows? Data modeling is very hard.
Folks in the library community still have trouble defining seemingly
simple concepts like "author". Despite all of the efforts of the
semantic web community to rigorously model data on the web, it's
arguable that the informal, "folksonomies" approach - which is very
similar to category groups and categories - is having a larger
impact. These problems are not unique to OME, and we're not going to
able to solve them.
Binary data raises similar questions: OME's approach is to to
explain and document each bit of data, making it as easy to interpret
as possible. Binary data is by definition opaque, requiring external
knowledge to interpret it correctly. The preference would be,
whenever possible, to avoid the use of arbitrary binary data - I'm
guessing that this argument is not too controversial. However, there
might be times when the benefits of adding such data are significant.
My hunch is that the decision about such things should be based on a
cost-benefits analysis involving factors such as the benefits gained,
the cost of implementation, the cost to the data model, and other
questions. If there's a significant benefit, and things can be done
easily without leading to problems with the data model, then sure,
why not? I'd say that this is particularly true if we can do it via
custom modules and STs that need not be part of "core" OME.
On the other hand, saying that OME shouldn't handle binary data
because it somehow offends aesthetics or some sense of philosophical
purity seems to me to be the wrong approach. This, however, is a
straw man - i don't hear anybody making this argument.
As far as lowering barriers to entry, it's clear that this needs to
happen. One thing that would help me would be a concise list of
needs: what are the top items (ideally, prioritizied) that would get
some new folks to jump in? I think it might also help for us to come
up with some descriptive scenarios that might describe how people
might work to combine OME with current work practices.
For example, users with ever-changing home-grown analyses might
simply want to define STs as needed to store data and then import
external analysis results. If we could describe how to do this, and
perhaps provide appropriate tools, that might provide them with a
clear enough view of how to proceed.
thoughts?
-harry
More information about the ome-devel
mailing list