[ome-devel] OME-XML Schema (Mis-?)Use of ID token?
Ilya Goldberg
igg at nih.gov
Mon Dec 4 15:51:28 GMT 2006
On Dec 1, 2006, at 2:58 PM, Isaak Berg wrote:
> Greetings all,
>
> I am fairly new to XML and have been working with validating via
> the OME xsd's. While using Microsoft Visual Studio .NET 2003 for
> XML validation of the XML from some of the sample documents, the
> validator used by that development environment pointed out ever so
> politely that identical ID values were being used multiple times.
>
> Within XML, there is some mention of a validity constraint that
> each element type in an XML document can only be assigned a given
> ID once. If my understanding of the constraint is accurate, (a big
> IF), the sample documents I have seen thus far violate it by using
> an attribute name of "ID" everywhere, instead of using "ID" only in
> the element defining the information that the particular LSID
> actually refers to, and "IDRef" everywhere else that only refers
> to, rather than defines, that information. See http://www.w3.org/TR/
> 2000/WD-xml-2e-20000814#one-id-per-el.
As far as I know, there was never a constraint within XML on using
attributes named "ID", and reference resolution was up to the
parser. A referential integrity constraint could be built using
XPath, but these types of constraints were never included in OME XML.
The "One ID per element type" is certainly being obeyed by the
schema. An element referring to another element is a different type
than the element being referred to. I think the difference with MS's
validators is that they assume that an attribute named "ID" must be a
"TokenizedType" (i.e. an actual XML ID) rather than a regular old
attribute (a "StringType") called "ID", which is the default.
Without making this assumption the XML is valid because we're just
using "StringType" attributes, some of which are called "ID".
>
> I am not sure whether the validator in use by VS.NET2003 is product-
> specific or an implementation of the .NET or MS_XML validation
> APIs, but it sure would be nice (for both developers and users) if
> OME could update the schema to use IDRefs, so that documents can be
> generated in compliance with the validity constraint and still
> validate correctly with ome.xsd.
I think that the problem arises because MS's validators make an
assumption about attribute type based on the attribute name (a hold
over from hungarian notation, maybe?). I don't think type can be
inferred from a variable's name in most circumstances - especially
when there is the opportunity to specify the type explicitly. I
wonder if this assumption can even be over-ridden by explicitly
specifying that the ID attributes are in fact "StringType", rather
than the validator-imposed "TokenizedType".
>
> Note that I did check against the "definitive" validator referenced
> by Ilya in April '05 (version 2.9.1 of the Windows version).
> " The definitive XML Validator is http://www.w3.org/2001/03/webdata/
> xsv, though validation should pass in TurboXML as well."
> While that validator is indeed very nice, it did not flag the
> multiple ID assignments.... (Neither does XML Notepad 2006).
> However if the schema is updated to use IDRefs instead of ID's, all
> instance documents already generated to date will be invalid on t
Therein lies the rub. I can see how changing the name of the ID
attribute in reference elements could be more clear - especially if
one is a believer in identifying the type in the variable name (as
per hungarian notation). I don't think its fair to say that the
other way is "invalid" however. Since this would invalidate all
extant OME XML documents, I would be hesitant to make this change,
but we are preparing an update to OME XML and this may be the right
time to do it if others agree. It certainly wouldn't be the first
time we've programmed around invalid assumptions made by MS software. ;)
>
> I hope we all agree that it is beneficial to everyone to ensure
> that either OME-XML (and OME_TIFF) documents and schemas
> can be used together in a manner compliant with the XML
> specification.
>
> I've attached a version of ome.xsd with the proposed changes made
> so that you can difference it versus
> http://www.openmicroscopy.org/XMLschemas/OME/FC/ome.xsd.
Thanks for that - it certainly seems like a straight-forward change.
The change would have to be explicit about the MS assumption though -
these attribute types would have to change from the (implicit)
"StringType" default to the explicit "TokenizedType". This way the
MS assumption would be consistent with validators that make no
assumption about type from the attribute name. The backwards-
compatibility of existing documents is the major issue, I think.
It may be worth-while to see if the MS "TokenizedType" assumption can
be over-ridden in the schema by explicitly declaring these attributes
as "StringType" - would you mind seeing if that's possible? If so,
this change would allow everything to work without having to
invalidate old XML documents.
>
> Incidentally, I also noticed that when validating the sample OME-
> TIFFs data (against my version of ome.xsd modified for the proposed
> OME-TIFF spec)
> validating parser gave me an error for the "Locked" attribute of
> the Dataset element when its value was the empty string.
> Is this a schema deficiency (i.e. was an empty value intended to be
> permitted (if so, the type of Locked should be some other type than
> boolean)?) or
> the document (was a default value intended to be indicated, if so,
> I believe the entire attribute should have been omitted)?), or is
> the validating parser
> I am using being too strict on this point?
A blank is not a valid boolean. If the intent is to specify a
default boolean, then the attribute should be left out. As far as I
know, it is not possible to specify a NULL boolean explicitly (and
there really should not be a need to either).
>
> Could someone please provide a ue-case scenario of how the "Locked"
> attribute is intended to be used?
Locked simply means that the Dataset's collection of images can no
longer be altered. This is the case when Dataset-granularity
attributes are generated for the dataset. For example, one could
perform a calculation over all of the images in a dataset (average
signal intensity, say). If the set of images comprising the dataset
is subsequently altered, then the computed signal intensity would no
longer be valid. So the Locked attribute is used to signify that
there exists a piece of information pertaining to the dataset as a
whole.
>
> The sample OME-TIFFs are also missing a Filter child element of the
> Image element.
> Needless to say, we are anxiously awaiting the publication of an
> OME-TIFF schema!
The OME-TIFF schema is no different in this respect from OME-XML. It
only differs in the structure of the elements within the "Pixels"
element. There are still a few minor issues with the XML in OME-
TIFF. We anticipate that these will be resolved as we've just
successfully connected up BioFormat with OME, which will allow us to
identify and iron-out all of these problems systematically.
Thanks,
Ilya
>
>
>
> Isaak Berg
> Software Developer
>
> isaak at cimaging.net
>
>
> Compix Inc., Imaging Systems
>
> 109 Nicholson Road
>
> Sewickley, PA 15143
>
> 412-741-7920
>
> 412-741-7930 Fax
>
> www.cimaging.net
>
>
>
>
> <ome.xsd>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
More information about the ome-devel
mailing list