[ome-devel] Re: INCell images/importer

Tue Oct 11 18:36:42 BST 2005

On Oct 5, 2005, at 3:45 PM, Bernd Jagla wrote:

> Ilya, and all,
>
> I would store the information from the "run"-files on the dataset  
> level and
> come up with additional attributes on the dataset level to store that
> information.

You can use Datasets, but you can also use existing Plate/Well/Screen  
types.
We are in the process of reworking these slightly, the proposed model  
is here:
http://cvs.openmicroscopy.org.uk/tiki/tiki-index.php? 
page=DataModelProposal
There is some discussion on the details of the fields in the classes,  
but the overall class structure will basically be what's proposed  
there.

We don't have a handy diagram of the current class structure, but you  
can look at it yourself either using a DB schema diagraming tool, or by  
manually inspecting the STs here:
OME/src/xml/OME/Core/Screen.ome

Since the STs for the new class model aren't defined yet, probably the  
thing to do it to use the old one for now.  The new one is a superset  
of the old one, and we will probably (definitely?) have an update  
script that will convert from the old to the new.  Maybe by the time  
you get that far, the new one will be in place (we're starting to think  
about a release in the short-term).

None of these (in either old or new) are dataset attributes.  The  
dataset is a separate organizational structure.  The idea is that you  
should be able to easily collect all the images on a certain plate or  
in a certain screen, and make a new dataset out of them.  Usually  
people split these into datasets in several different ways (controls  
vs. experimentals, plate by plate, etc) for analytical/organizational  
purposes, which are different from the image's associations with plates  
and screens.
>
>
>
> *      How do I access those attributes? I would do it from the  
> importGroup
> function (that I have to write) using the newAttribute function ( I  
> will
> look through a few examples to find a clue on how to do that, so far I  
> have
> only seen attributes for images.).

The ImportEngine together with the ImportManager manages the  
module_executions you need to register your import.  There are Image,  
Dataset and Global module executions for producing image attributes,  
global attributes and dataset attributes.
my $image_mex = OME::Tasks::ImportManager->getImageImportMEX($image); #  
One of these per OME image (not necessarily per-file)
my $global_mex = OME::Tasks::ImportManager->getGlobalImportMEX();     #  
One of these for the entire import

Then,
my $plate_attr = $factory->newAttribute('Plate',undef,$global_mex,{
	Name => $plate_name,
	ExternalReference => 'abcdefg1234567', # A foreign key into a  
different database
	# The Plate ST has a direct Screen reference, which is wrong - don't  
use it.
	# Plates and Screens are many-to-many, which has always been the case.
});

my $screen_attr = $factory->newAttribute('Screen',undef,$global_mex,{
	Name => $screen_name,
	Description => $description,           # Free text
	ExternalReference => 'abcdefg1234567', # A foreign key into a  
different database
});

# You have to manually define the mapping object.  This may be done  
away with in the future, but you have to do it yourself for now
my $plate_screen_map_attr =  
$factory->newAttribute('PlateScreen',undef,$global_mex,{
	Plate => $plate_attr,
	Screen => $screen_attr,
});

# Note that you probably want to do a search for the above three if you  
are interested in uniqueness based on Name, Description or  
ExternalReference.
# These calls do not enforce any unique constraints, but they will  
produce objects with unique IDs.

# The old-style combines sample info with the Plate-Image mapping object
# The new style breaks these up into separate classes
my $well_attr = $factory-> newAttribute('ImagePlate',$image,$image_mex,{
	Plate  => $plate_attr,
	Sample => $sample_number,
	Well   => $address,
});

>
> *      How should I distinguish between different plates? Would this  
> be a
> dataset underneath a dataset or a parallel dataset? I could probably  
> also
> create different new datasets from the run-file and link later the  
> images to
> those dataset. How would I do this the best way?

Currently the pattern is to leave this up to the user.  The important  
meta-data at import-time is to record the Well/Plate/Screen association  
of each image.  Later on the user can organize them however they want  
into datasets.

>
> *      Since all the information about a dataset is in a run-file, I  
> would
> like to remove all other files from the list and therefore allow only  
> one
> file (the run-file) to be in the list at the beginning. Would this be  
> OK?
> This would mean that the "run"-files have to be processed first.

The getGroups() method that you write for your importer has access to  
all files still in the import queue, and you can delete them from the  
import queue in this method so that subsequent importers don't get to  
see them.  The getGroups() methods for the various importers are called  
in the order specified in the import_formats configuration variable  
(select value from configuration where name = 'import_formats').  So  
you have full control over what gets called first and what files the  
other importers get to see if your importer is in front of them in the  
queue.  Generally, importers will ignore files they don't recognize, so  
the purpose of the ordering is to put more general importers (i.e.  
TIFF) after the more specific importers (i.e. all the TIFF variants).

>
>
>
> In the getGroups function I would then add all the images (frm-files)  
> to the
> list.

Actually, you would add them to your "group" and take them off the  
import queue (the hash reference you get as a parameter to getGroups).
The groups you generate are opaque to the rest of the ImportEngine, so  
you can structure them any way you want.  The getGroups() method  
returns a list of groups, and at a later time, your importGroup($group)  
method will be called once for each group in the list you returned from  
getGroups.  What you store in $group is up to you, but obviously a  
reference to the File objects in your group would be most convenient.

The purpose of the group is to collect files that constitute a single  
OME Image.  This is not enforced, but its probably good practice to  
stay consistent with other importers.  It would probably make the most  
sense to create the global objects in getGroups() (i.e. Plate, Screen,  
and PlateScreen), and store references to them in your $group objects,  
which will be one per image.  Then, importGroup() would import a single  
image per call, and create the 'ImagePlate' ST to associate it with the  
appropriate Plate.

> *      Should I modify the attributes of the dataset already in the
> getGroups function? This would make my life easier when importing the  
> frm
> images, because I can then remove even the run-file from the file list  
> (no
> need for an importGroups function) and do the rest of the work in the  
> frm
> importer.

the importGroup() method returns the OME Image object(s) it creates.   
Its probably a good idea to keep that behavior.  But it doesn't matter  
if you create the image in getGroups - it would just be different from  
all the other importers (which isn't great).

> *      How do I add image files (frm files) to the file list? I use  
> upload
> (as Ilya said) but then I don't know how to register the connection in  
> the
> "regular" way and I don't see the new file being processed (I guess  
> this has
> something to do with the registration).

The simplest way is to call:
my $OF_attr = $self->__touchOriginalFile($file,"InCell 3000");
OME::Tasks::ImportManager->markImageFiles($image,$OF_attr);

$file being what you got from the  
OME::Image::Server->uploadFile($path_to_file) call to upload a file  
that was not in the import queue (who's path you got from the run  
file).
$image being the OME::Image object you create in importGroup().

-Ilya