[ome-devel] Re: Code Narratives

Thu Feb 2 22:16:38 GMT 2006

I'm CCing OME-Devel because it's these others may be curious about 
these questions.

> On Jan 30, 2006, at 10:51 AM, Tony Scelfo wrote:
>> I want to do the following progression.
>> 1. Get a dataset.
>> 2. Using the dataset, get all the images in the dataset.
>> 3. Get a list of all the Semantic Types with numerical values that the
>> images have in common.
>> 4. Present the user with a list of the names of those Semantic Types.
>> 5. Based on user input through drop down boxes, retrieve the values of
>> the Semantic Types for each image in the dataset.
>> 6. Plot the values.

As Harry pointed out below, (3) seems to be the crux of your problem. 
Let me know if there is another point you would like addressed.

There are three strategies of doing this, with decreasing use of the 
provenance data model. All would need to be implemented in manager 
classes. They are based on Chain Executions, Module Executions, and 
Semantic Types. They ensure various degrees of uniformity among data 
displayed (summarized in the table below). If properly implemented, 
they would each have interfaces with the same degree of complexity.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedGraphic1.pdf
Type: application/pdf
Size: 8608 bytes
Desc: not available
Url : http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20060202/162e7184/pastedGraphic1.pdf
-------------- next part --------------

CHEX based retrieval
------------------------------------

The only drawback to this strategy is that it won't show annotations.
In your scenario, each entry in your list would be a triplet of: <  
CHEX, Node, ST >. I would implement this as a triple loop that  
populates that list with unique entries. The contents of each loops  
are:

1) Find all not stale Chain Executions (CHEX's) for the dataset in  
question.
	A chain is executed on an entire dataset. The dataset is locked only  
if a chain combines data from many images (e.g. calculate average blobs  
per image in the dataset). Many chains, such as the import chain, do  
not lock datasets. If images are added or removed from a dataset, then  
all CHEXs ran on that dataset are stale because the set of images they  
refer to no longer match the images in the dataset. We need to add a  
boolean column named "stale" to the CHEX package, and set it to true  
when the images within a dataset change. Until then, a more intensive  
query will be needed to determine if a CHEX is stale. A dataset manager  
method needs to be written for this functionality, so the  
implementation can be switched later.

2) Find all Nodes in each CHEX that produced image granularity outputs.
my @NEXS = (
	# First find Nodes with declared image outputs.
	$factory->findObjects( 'OME::AnalysisChain::Node',
		# limit to to the list of CHEX's
		'analysis_chain.analysis_chain_execution' => $CHEX,
		# limit to modules with declared image granularity outputs
		'module.outputs.semantic_type.granularity' => 'I',
		# don't allow duplicates.
		__distinct => 'id'
	),
	# Add Nodes with undeclared image outputs.
	$factory->findObjects( 'OME:: AnalysisChain::Node',
		# limit to to the list of CHEX's
		'analysis_chain.analysis_chain_execution' => $CHEX,
		# limit to module executions with undeclared image granularity outputs

'executions.module_execution.untypedOutputs.semantic_type.granularity'  
=> 'I',
		# don't allow duplicates.
		__distinct => 'id'
	) );
This would need a further check to remove duplicates. Ideally, factory  
would allow an 'OR' around the granularity blocks, which would give us  
what we want with a single query. The documentation for factory says  
complex calls are meant for SQL. A valid strategy for a manager class  
would be to directly issue an SQL call for the id of those Nodes, then  
load those nodes through normal Factory calls & return. The SQL for  
CHEX id's 7 could look like this:

-- Get IDs for Nodes for a given list of CHEXs that have typed &  
untyped outputs with STs of image granularity
SELECT DISTINCT ON (nodes.analysis_chain_node_id)  
nodes.analysis_chain_node_id
FROM analysis_chain_nodes nodes
WHERE
	-- Typed outputs with image granularity
	nodes.analysis_chain_node_id in (
		SELECT nodes.analysis_chain_node_id
		FROM analysis_chain_nodes nodes, formal_outputs, semantic_types STs,  
analysis_chain_executions CHEXs
		WHERE
			nodes.module_id = formal_outputs.module_id AND
			formal_outputs.semantic_type_id = STs.semantic_type_id AND
			STs.granularity = 'I' AND
			nodes.analysis_chain_id = CHEXs.analysis_chain_id AND
			CHEXs.analysis_chain_execution_id = 7
	) OR
	-- Untyped outputs with image granularity
	nodes.analysis_chain_node_id in (
		SELECT nodes.analysis_chain_node_id
		FROM analysis_chain_nodes nodes, module_executions MEXs,
			semantic_type_outputs, semantic_types STs ,  
analysis_chain_executions CHEXs
		WHERE
			nodes.module_id = MEXs.module_id AND
			MEXs.module_execution_id = semantic_type_outputs.module_execution_id  
AND
			semantic_type_outputs.semantic_type_id = STs.semantic_type_id AND
			STs.granularity = 'I' AND
			nodes.analysis_chain_id = CHEXs.analysis_chain_id AND
			CHEXs.analysis_chain_execution_id = 7
	)
;

Parenthetically, it is several times faster to load the typed & untyped  
sections in separate subclauses, presumably because it results in a  
much smaller join.

3) Given a CHEX and Node, find outputs of image granularity that have  
at least one numeric element
	my @STs;
	# First get the STs of typed outputs.
	my @typed_outputs = $node->module->outputs(
		'semantic_type.granularity' => 'I',
		'semantic_type.semantic_elements.data_column.sql_type' => [ 'in',
			[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ],
	);
	push @STs, map( $_->semantic_type, @typed_outputs );
	# Next get the STs of untyped outputs that were created for every node  
execution for this node in this CHEX
	my @NEXs = $chex->node_executions( analysis_chain_node => $node );
	my @MEXs = map( $_->module_execution, @NEXs );
	my @prospectiveUntypedOutputs = $factory->findObjects(  
'OME::ModuleExecution::SemanticTypeOutput',
		module_execution => [ 'in', \@MEXs ],
		'semantic_type.semantic_elements.data_column.sql_type'  => [ 'in',
			[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ],
		__distinct => 'semantic_type'
	);
	foreach my $candidateUntypedOutputs ( @prospectiveUntypedOutputs ) {
		# Each node executions has exactly one module execution
		# If the count of an untyped output is the same as the count of MEXs,  
then is one per MEX
		# This test is needed because image import may have produced  
different metadata for different images.
		# e.g. One image may have exposure stored in its original file and  
another may not.
		my $outputCount =  $factory->countObjects(  
'OME::ModuleExecution::SemanticTypeOutput',
			semantic_type => $candidateUntypedOutputs->semantic_type,
			module_execution => [ 'in', \@MEXs ]
		);
		push @STs, $candidateUntypedOutputs->semantic_type
			if( $outputCount eq scalar( @MEXs ) );
	}

MEX based retrieval
------------------------------------

This strategy will show annotations, but will not cope well with  
modules that have been executed multiple times (i.e. with multiple  
parameters). The code snippet given below will use the most recently  
executed MEX. Ideally, this would be used in conjunction with the CHEX  
approach, and only MEXs that lacked node executions would be  
considered. Recall that a module have one execution per dataset, or one  
per image.
In your scenario, each entry in your list would be a double of: <  
Module, MEX_list, ST >.

	# Load modules with image granularity outputs and with untyped outputs.
	my @candidateModules = $factory->findObjects( 'OME::Module',
		'outputs.semantic_type.granularity' => 'I',
		'outputs.semantic_type.semantic_elements.data_column.sql_type' => [  
'in',
			[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ],
		__distinct => 'id'
	);
	push @candidateModules, $factory->findObjects( 'OME::Module',
		'outputs.semantic_type' => ['is', undef ]
	);
	my @module_ST_list;
	foreach my $module ( @candidateModules ) {
		# add if it has been executed on the whole dataset
		if( my $datasetMEX = $factory->findObject( 'OME::ModuleExecution',
			dataset => $dataset,
			module => $module,
			__order => '!timestamp' )
		) {
			# Load typed outputs
			my @imageOuputs = $module->outputs(
				'semantic_type.granularity' => 'I',
				'semantic_type.semantic_elements.data_column.sql_type' => [ 'in',
					[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ]
			);
			# Load untyped outputs
			push( @imageOuputs, $datasetMEX->untypedOutputs(
				'semantic_type.granularity' => 'I',
				'semantic_type.semantic_elements.data_column.sql_type' => [ 'in',
					[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ],
			) );
			# Add to list
			foreach my $imOut ( @imageOuputs ) {
				push( @module_ST_list, [ $module, [$datasetMEX],  
$imOut->semantic_type ] );
			}
		# Add if it has been executed on every image in the dataset.
		} else {
			my @imageMEX_List;
			foreach my $image ( $dataset->images() ) {
				my $imageMEX = $factory->findObject( 'OME::ModuleExecution',
					image => $image,
					__order => '!timestamp'
				);
				if( $imageMEX ) {
					push( @imageMEX_List, $imageMEX );
				} else {
					last;
				}
			}
			# Try the next module if it didn't have a MEXs for every image in  
the dataset.
			next unless scalar( @imageMEX_List ) eq $dataset->count_images();

			# Load typed outputs
			my @imageOuputs = $module->outputs(
				'semantic_type.granularity' => 'I',
				'semantic_type.semantic_elements.data_column.sql_type' => [ 'in',
					[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ]
			);
			@imageOuputs = map( $_->semantic_type, @imageOuputs );

			# Load untyped outputs that were produced by every imageMEX.
			my @prospectiveUntypedOutputs = $factory->findObjects(  
'OME::ModuleExecution::SemanticTypeOutput',
				module_execution => [ 'in', \@imageMEX_List ],
				'semantic_type.semantic_elements.data_column.sql_type' => [ 'in',
					[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ],
				__distinct => 'semantic_type'
			);
			foreach my $candidateUntypedOutputs ( @prospectiveUntypedOutputs ) {
	 			# If the count of an untyped output is the same as the count of  
MEXs, then is one per MEX
				# This test is needed because annotation may have produced  
different metadata for different images.
				my $outputCount =  $factory->countObjects(  
'OME::ModuleExecution::SemanticTypeOutput',
					semantic_type => $candidateUntypedOutputs->semantic_type,
					module_execution => [ 'in', \@imageMEX_List ]
				);
				push @imageOuputs, $candidateUntypedOutputs->semantic_type
					if( $outputCount eq scalar( @imageMEX_List ) );
			}

			# Add to the list
			foreach my $ST ( @imageOuputs ) {
				push( @module_ST_list, [ $module, \@imageMEX_List, $ST ] );
			}
		}
	}

That was a mouthful.

ST based retrieval
------------------------------------

	# Find all image STs that have numeric values.
	my @prospectiveImageSTs = $factory->findObjects( 'OME::SemanticType',
		'semantic_elements.data_column.sql_type' => [ 'in'
			[ 'bigint', 'integer', 'smallint', 'double precision', 'real' ] ],
		__distinct => 'id'
	);
	my @imageST_List;
	foreach my $ST ( @prospectiveImageSTs ) {
		# verify that each image in the dataset has one of these STs.
		my $numImagesWithThisST = $factory->countAttributes( $ST,
			'image.datasets' => $dataset,
			__distinct => 'image'
		);
		push @imageST_List, $ST
			if( $numImagesWithThisST eq $dataset->count_images );
	}

-Josiah

On Jan 30, 2006, at 12:18 PM, Harry Hochheiser wrote:

>
> Ok, Tony, here goes. I don't know all of the answers you're looking  
> for, so I'm going to ask Josiah for help.
>
> On Jan 30, 2006, at 10:51 AM, Tony Scelfo wrote:
>
>> I want to do the following progression.
>>
>> 1. Get a dataset.
>> 2. Using the dataset, get all the images in the dataset.
>
> I assume you're ok with these steps?
>
>> 3. Get a list of all the Semantic Types with numerical values that the
>> images have in common.
>
> ok. here it starts to get interesting. First, by way of clarification,  
> I assume that you mean:
> "find a list of all STs with numerical values that have instances for  
> the images in the dataset"?
>
> If this is not correct, please let me know.
>
> Assuming it is, this is where I need Josiah to clarify.  Let's start  
> with what we can do.
> According to Image.pm, the image type does have an accessor called  
> "all_features" which gets all of the features associated with the  
> image. You should be able to get this via an "all_features" entry in  
> the fields wanted to a retrieveObjects call in OME-JAVA.
>
> The next challenge is to find the semantic types for each of these  
> features. I frankly don't know of any good way to do this. Given that  
> the semantic types are distributed among n different tables, and that  
> the feature table does not indicate the ST, it's certainly possible to  
> (a) find all of the STs? and (b) look for all of the instances of each  
> ST, finding those that match the features returned by the  
> "all_feature" call.
>
> Although technically possible,  this sort of construction seems  
> somewhat scary. Josiah, is there any better way around this?
>
> Once you get the actual ST instances,
>
>> 4. Present the user with a list of the names of those Semantic Types.
>> 5. Based on user input through drop down boxes, retrieve the values of
>> the Semantic Types for each image in the dataset.
>> 6. Plot the values.
>>
>
> Presumably, you'll be ok with these steps once you get #3 taken care  
> of.
>
>> There will also be another variation where I will use image->feature
>> instead of dataset->image to look at the values of Feature level
>> Semantic Types within an image.
>>
>>
> Again, this should be straightforward once you get #3 down.
>
> harry
>