[ome-devel] OME-XML Schema (Mis-?)Use of ID token?

Mon Dec 4 15:51:28 GMT 2006

On Dec 1, 2006, at 2:58 PM, Isaak Berg wrote:

> Greetings all,
>
> I am fairly new to XML and have been working with validating via  
> the OME xsd's.  While using Microsoft Visual Studio .NET 2003 for  
> XML validation of the XML from some of the sample documents, the  
> validator used by that development environment pointed out ever so  
> politely that identical ID values were being used multiple times.
>
> Within XML, there is some mention of a validity constraint that  
> each element type in an XML document can only be assigned a given  
> ID once.  If my understanding of the constraint is accurate, (a big  
> IF), the sample documents I have seen thus far violate it by using  
> an attribute name of "ID" everywhere, instead of using "ID" only in  
> the element defining the information  that the particular LSID  
> actually refers to, and "IDRef" everywhere else that only refers  
> to, rather than defines, that information. See http://www.w3.org/TR/ 
> 2000/WD-xml-2e-20000814#one-id-per-el.

As far as I know, there was never a constraint within XML on using  
attributes named "ID", and reference resolution was up to the  
parser.  A referential integrity constraint could be built using  
XPath, but these types of constraints were never included in OME XML.
The "One ID per element type" is certainly being obeyed by the  
schema.  An element referring to another element is a different type  
than the element being referred to.  I think the difference with MS's  
validators is that they assume that an attribute named "ID" must be a  
"TokenizedType" (i.e. an actual XML ID) rather than a regular old  
attribute (a "StringType") called "ID", which is the default.   
Without making this assumption the XML is valid because we're just  
using "StringType" attributes, some of which are called "ID".

>
> I am not sure whether the validator in use by VS.NET2003 is product- 
> specific or an implementation of the .NET or MS_XML validation  
> APIs, but it sure would be nice (for both developers and users) if  
> OME could update the schema to use IDRefs, so that documents can be  
> generated in compliance with the validity constraint and still  
> validate correctly with ome.xsd.

I think that the problem arises because MS's validators make an  
assumption about attribute type based on the attribute name (a hold  
over from hungarian notation, maybe?).  I don't think type can be  
inferred from a variable's name in most circumstances - especially  
when there is the opportunity to specify the type explicitly.  I  
wonder if this assumption can even be over-ridden by explicitly  
specifying that the ID attributes are in fact "StringType", rather  
than the validator-imposed "TokenizedType".

>
> Note that I did check against the "definitive" validator referenced  
> by Ilya in April '05 (version 2.9.1 of the Windows version).
> " The definitive XML Validator is http://www.w3.org/2001/03/webdata/ 
> xsv,  though validation should pass in TurboXML as well."
> While that validator is indeed very nice,  it did not flag the  
> multiple ID assignments.... (Neither does XML Notepad 2006).
> However if the schema is updated to use IDRefs instead of ID's, all  
> instance documents already generated to date will be invalid on t

Therein lies the rub.  I can see how changing the name of the ID  
attribute in reference elements could be more clear - especially if  
one is a believer in identifying the type in the variable name (as  
per hungarian notation).  I don't think its fair to say that the  
other way is "invalid" however.  Since this would invalidate all  
extant OME XML documents, I would be hesitant to make this change,  
but we are preparing an update to OME XML and this may be the right  
time to do it if others agree.  It certainly wouldn't be the first  
time we've programmed around invalid assumptions made by MS software. ;)

>
>  I hope we all agree that it is beneficial to everyone to ensure  
> that either OME-XML (and OME_TIFF) documents and schemas
>  can be used together in a manner compliant with the XML  
> specification.
>
> I've attached a version of ome.xsd with the proposed changes made  
> so that you can difference it versus
> http://www.openmicroscopy.org/XMLschemas/OME/FC/ome.xsd.

Thanks for that - it certainly seems like a straight-forward change.   
The change would have to be explicit about the MS assumption though -  
these attribute types would have to change from the (implicit)  
"StringType" default to the explicit "TokenizedType".  This way the  
MS assumption would be consistent with validators that make no  
assumption about type from the attribute name.  The backwards- 
compatibility of existing documents is the major issue, I think.
It may be worth-while to see if the MS "TokenizedType" assumption can  
be over-ridden in the schema by explicitly declaring these attributes  
as "StringType" - would you mind seeing if that's possible?  If so,  
this change would allow everything to work without having to  
invalidate old XML documents.

>
> Incidentally, I also noticed that when validating the sample OME- 
> TIFFs data (against my version of ome.xsd modified for the proposed  
> OME-TIFF spec)
> validating parser gave me an error for the "Locked" attribute of  
> the Dataset element when its value was the empty string.
> Is this a schema deficiency (i.e. was an empty value intended to be  
> permitted (if so, the type of Locked should be some other type than  
> boolean)?) or
> the document (was a default value intended to be indicated, if so,  
> I believe the entire attribute should have been omitted)?), or is  
> the validating parser
> I am using being too strict on this point?

A blank is not a valid boolean.  If the intent is to specify a  
default boolean, then the attribute should be left out.  As far as I  
know, it is not possible to specify a NULL boolean explicitly (and  
there really should not be a need to either).

>
> Could someone please provide a ue-case scenario of how the "Locked"  
> attribute is intended to be used?

Locked simply means that the Dataset's collection of images can no  
longer be altered.  This is the case when Dataset-granularity  
attributes are generated for the dataset.  For example, one could  
perform a calculation over all of the images in a dataset (average  
signal intensity, say).  If the set of images comprising the dataset  
is subsequently altered, then the computed signal intensity would no  
longer be valid.  So the Locked attribute is used to signify that  
there exists a piece of information pertaining to the dataset as a  
whole.

>
> The sample OME-TIFFs are also missing a Filter child element of the  
> Image element.
> Needless to say, we are anxiously awaiting the publication of an  
> OME-TIFF schema!

The OME-TIFF schema is no different in this respect from OME-XML.  It  
only differs in the structure of the elements within the "Pixels"  
element.  There are still a few minor issues with the XML in OME- 
TIFF.  We anticipate that these will be resolved as we've just  
successfully connected up BioFormat with OME, which will allow us to  
identify and iron-out all of these problems systematically.

Thanks,
Ilya

>
>
>
> Isaak Berg
> Software Developer
>
> isaak at cimaging.net
>
>
> Compix Inc., Imaging Systems
>
> 109 Nicholson Road
>
> Sewickley, PA 15143
>
> 412-741-7920
>
> 412-741-7930 Fax
>
> www.cimaging.net
>
>
>
>
> <ome.xsd>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel