[ome-devel] Shoola-back-end consistency issues

Thu Jul 7 14:51:16 BST 2005

On Jul 7, 2005, at 3:38 AM, Josh Moore wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Ilya Goldberg wrote:
>> On Jul 6, 2005, at 4:23 PM, Josh Moore wrote:
>>>>>> Ilya Goldberg wrote:
>
>>> I thought you were all for representing things in the DB as much as
>>> possible.  I certainly am.
>
> As we've discussed, I'm _not_ for the DB being the API; I'm for the DB
> being the model. The entire definition of what our objects are should
> exist in a single language. Whether OWL or SQL Schema or if it must be
> XML Schema I don't care, I'll generate the others with an isomorphism.
> But the _logic_ for working with that model, the API, I want somewhere
> else. Not DB triggers, and not spread over multiple languages.

I think we're on the same page here.  I'm not advocating triggers  
seriously.

>
>
>>> I know its poison to your ears, but interoperability is achieved in  
>>> the
>>> Enterprise using "real" programming languages like VBA, and tools  
>>> like
>>> Excel, and SpotFire, and such things.
>
> Sorry, a misunderstanding. I have nothing against Excel or Spitfire as
> clients, nor VBA as a language. I meant I don't want to be doing this
> stuff with procedural languages in the DB. The Enterprise is about
> making all kinds of legacy things work together. I'm for that; I just
> want lots of tools to do it for me!
>
> By the way, probably the easiest/simplest thing to do for excel is have
> a service spit out csv. One sheet of your spreadsheet gets updated and
> the rest just "refreshes". I know its not an ODBC/JDBC connector or
> anything, but, well, we know what I think about that.

We've got some csv things, and those are OK.  But there are still holes  
left which require API-level access (like logging in, better queries,  
etc).  Csv is simple and good enough, and we have that:
http://localhost/perl2/serve.pl?Page=OME::Web:: 
DBObjTable&SessionKey=ce11152b03731da0832260ca336b53e8&Type=@Signal&@Sig 
nal.module_execution_id=187&Format=txt
Having a good way to get the SessionKey and module_execution_id using  
our query tools via the API would be better though.

>>>

> [Now to the nuts and bolts stuff]
>
>> What does the "in effect" modifier in the above mean?
>>
>>> Its versioning without a serial version number.  If you're satisfied
>>> with a MEX ID as a version number, then its versioning.
>
> Think my question was more regarding the connections. Is _all_ data
> currently tied to a MEX? And not just some MEX through the graph, but
> truly a MEX which represents the creation (i.e. INSERT) or update of a
> row in the database? Let's look at the example of the  
> image_dataset_map.
>
>> What are the side-effects you're talking about?
>>
>>> There are just some edge-cases to work through.  Previously the
>>> assumption was that a given dataset has a single 'version' of the  
>>> images
>>> it contains.  Once somebody had something to say about the dataset  
>>> (i.e.
>>> it acquired an 'attribute', and hence a MEX with that dataset as a
>>> target), the dataset was considered locked, and its image content  
>>> could
>>> not be changed.
>>
>>> If images are mapped to datasets using an object tied to a MEX, it  
>>> means
>>> that the collection of images in a dataset is 'versioned'.  This  
>>> means
>>> that we don't ever have to lock datasets, but now we must identify  
>>> the
>>> 'version' when we apply an analysis chain to it (or look at it in a
>>> viewer).
>
> I think was a lot of the issues that Josiah tried to inform us about in
> Baltimore. What's the status with all that, J? Having more information
> in our maps seems beneficial and cleaner at a first glance, but does it
> get you all the requirements you have for the feds?

Feds? You mean the CFR part 11 stuff?  We need to have an audit trail.   
The maps that do the many-to-many for the non-ST objects are also not  
STs, therefore don't have MEXes.

>
> Where all would such new MEXES need to be added to make this work? We
> listed the maps. Are there any use cases where one MEX on a row would
> not be sufficient, i.e. two "MODULES" (we'll get to this next) could be
> seen to change a row within one action/execution/transsaction? This
> would be nasty. Would we need a  
> "timestamp/version_module_execution_map"?

OMG.  Why two MEXes?  Can two modules issue the very same output?  Its  
a logical fallacy.

>
>
>>> You don't have to think of a module as an algorithm either.  If you
>>> think of it as a construction that has inputs and outputs, one that  
>>> can
>>> be "executed" to produce its outputs, then anything you do can be
>>> considered a module execution.  Please, lets not call it an
>>> instantiation or something.
>
> Alright. What are the constraints on a "module execution" to keep the  
> DB
> in a consistent statement for the AE? If I can say (within a single
> transaction):
>
>  d.addImage(i);
>  m = server.createModuleExecution();
>  server.update(d,m);
>
> Super. I'm pretty sure all that needs to be hidden from the clients.
> They say:
>
>  d.addImage(i).save();

The API follows the second style, I believe.  A DatasetManager class  
has a helper method to add an Image to a Dataset "the right way" using  
a single client call.

>
> I still think defining "modules" like this can alienate users. Perhaps
> that's not an issue. Makes sense when you're building an analysis  
> chain,
> but when a user starts reading about his/her modules from hitting a  
> save
> button, gets kind of esoteric.

Well, I don't know.  The idea is that everything you do is tied to a  
module - an action, whatever.  What do you suggest we call these things  
that would encompass modules and actions?  They go through the same  
mechanism, so how would calling them different things help the user?

>
>>> Uh, no.  They do get mexes, just not from the AE.  The MEX column in
>>> each attribute table now has a NOT NULL constraint, remember?
>
> For the things with MEXES. We still have to make everything a ST before
> this holds.

That's true.  Only STs have MEXes.  I'm not sure that making maps STs  
before making the DBObjects STs would hurt anything though.

>
>
>>> This is done with a bit of API that's targeted at import or  
>>> annotation,
>>> etc.  This API registers that a certain module ("Import",  
>>> "Annotation",
>>> etc.) executed and produced the recorded result.  Parts of the AE are
>>> indeed used to accomplish this, but it has to do with instantiating  
>>> STs
>>> rather than executing an analysis chain, so its not done via the AE.
>>> The AE is used to do only one thing:  Execute an analysis chain  
>>> against
>>> a dataset.
>>> In principle, the AE can be used for things like this, but it was  
>>> easier
>>> to do them through a specialized API rather than shoehorning this  
>>> into
>>> the AE interface.
>
> This is fine. This is exactly the kind of thing I was talking about way
> up top. An API sitting in a server doing the stuff its supposed to do.  
> I
> think very much that the AE, whether its code, its process, or simply
> its "memes" (in terms of user documentation and other cultural goods)
> need to be refactored to let the server do instantiation, etc. and have
> all the logic sit in one place. (Don't mean AE logic, I mean the logic
> to deal with our model.)

It does all sit in one place (more or less).  We don't have clients  
executing arbitrary modules yet, only specific tasks which are handled  
by the API on the back-end.  We talked about a general AE-ish API in  
Baltimore that would let clients execute arbitrary modules, and I think  
there was an actual plan.  Its not very complicated.

-I

>
>
> Otherwise,
>   have a nice day y'all.
>     Josh.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFCzNvrIwpkR5bKmAsRAuvRAJ47tkAEUFvzzVoOhBAa5LCBBaWvswCfZUpF
> pYULF+i1WrfYiA8NDiCbU8A=
> =6Qgy
> -----END PGP SIGNATURE-----
>