[ome-devel] Harvesting metadata from Omero with OAI-PMH

Caitlin Sticco brianandcaitlin at sbcglobal.net
Tue Feb 9 20:11:40 GMT 2010


Hi everyone,

We’re starting a new project at LOCI in collaboration with the UW
medical library. We are planning a small pilot project to try
harvesting metadata from Omero, using OAI-PMH (see below). Our plan is
to demonstrate how a searchable repository of “pre-cataloged”
microscopy records could be created by federating the records from
multiple researchers and using Omero as a public viewing interface for
the data.

OAI-PMH is the Open Archives Initiative Protocol for Metadata
Harvesting: http://www.openarchives.org/. It is used mostly by
libraries and repositories for exchanging XML metadata—for example,
harvesting article records from journal publishers. Our data already
meet most of the requirements to work correctly with PMH, as I
understand. You can find the implementation requirements summarized
at:
    http://www.oaforum.org/tutorial/english/page4.htm

And if you want, you can find more details of the implementation
requirements at:
    http://www.openarchives.org/OAI/2.0/guidelines.htm
    http://www.openarchives.org/OAI/2.0/guidelines-repository.htm

One of OAI-PMH's requirements is a datestamp to specify when a record
was last modified. (This allows the harvester to selectively harvest
records that are new, modified, or deleted within a certain date
range.) In the case of Omero, a "record" is an Image. Although Omero
doesn’t allow Image data to be modified right now, various associated
metadata (dataset membership, tags, annotations, etc.) can change. We
are planning to extract the Omero tags the users have added, and save
them in an extension under the Structured Annotation system. Thus, we
would still need a lastModified date for these changes. Fortunately,
the Omero database actually does record a datestamp for each
transaction (if I understood our conversation correctly), so we should
be able to derive from this information the last time a record was
modified.

Our goal is to write an OAI-PMH repository (server-side layer) for
Omero that exposes the metadata in XML form, via URLs such as:
    http://open.microscopy.wisc.edu/OAI-script?verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_dc

For a full description of the protocol's requirements, see:
    http://www.openarchives.org/OAI/openarchivesprotocol.html

This layer would live at the Django web server level. We would like
some guidance on how to get started with Django to enable such
functionality. Once we have the ability to serve XML according to the
properly formed structure, we hope to use existing OIA-PMH harvester
clients.

Does this plan make sense? Where is a good place to start looking at
how to implement such functionality within Omero using Django?

Thanks,
Caitlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20100209/81e0bdde/attachment.html>


More information about the ome-devel mailing list