[ome-devel] Super-Resolution standard format

Wed Sep 30 21:38:35 BST 2015

On 9/30/15 12:03 PM, Simon Li wrote:
> Hi Alex
>
> Intensity and TXYZ would require units. A factor should be provided to
>
>     convert the given units to standard (e.g. seconds, nm, photons). This
>     would allow the format to record using units of choice for the
>     software.
>
>
> That makes sense. However, for our first version how about we stick to 
> one unit for simplicity. nm? pixels? Hopefully the original image 
> contains enough metadata to convert between units.
I second the need for units from the very beginning.  I also think that 
it is important that the dataset can be interpreted without any need for 
the original image metadata.

>     It would be good to allow for storing the bounds of the data. This is
>     relevant to rendering images, density analysis and other data
>     post-processing.
>
>
> I think most of us assumed the localisation data would be stored 
> alongside the source images which would provide some (most?) of this 
> information.
Why that assumption?  One of the strengths of having a "universal" data 
format for localization data is that the localization data can be 
completely uncoupled from the images themselves, and can be 
visualized/analyzed without need for the original image data (which 
would be only needed if one would want to re-execute the fitting). It 
seems quite important to me that the format can be used without the need 
for any extraneous data.

>     To make it possible to link localisations together (i.e. to mark the
>     same molecule in different frames) would require an ID field. These
>     requirement make the file need both a header and records. It could
>     look
>     something like this:
>
>     Header:
>
>     Dimension,Min,Max,Unit,Conversion
>     ID,1,59,Count,2.34
>     T,1,1000,Frames,0.25
>     X,0,64,Pixels,107
>     Y,0,64,Pixels,107
>     Z,-500,500,nm,1
>     Intensity,10.987,1234,ADUs,0.025
>
>     This header corresponds to an image with 59 molecules, with an average
>     of 2.34 localisations per molecule (117 localisations in total). The
>     image was taken for 1000 frames at a frame rate of 250 milliseconds,
>     with 107nm pixel pitch on a 64x64 pixel camera, 500nm depth of
>     field for
>     3D fitting, with a gain of 40.
>

The "Conversion" field is a bit funny.  It seems to imply that there is 
a "gold standard" unit, which itself is implicit.  They appear to be:
ID: average localizations per molecule (why is that number coupled to 
the unit-less "ID"?)
T: seconds
X: nm
Y: nm
Z: nm
Intensity: ???

At the very least, add an extra field stating what the real unit is for 
each field.  Even better, ask the file writer to carry out the 
conversion and only list the unit in which the data are actually saved.  
Define one or two acceptable units for each field so that code writers 
interpreting the data know what to expect and can handle data accordingly.

AS for Intensity, I am confused by the units as presented.  Gain as in 
what?  Linear EM gain of the camera?  It would by far be nicest to store 
intensity data in number of photons detected.  If that number is not 
available (which would be strange/sloppy, since all modern EM cameras 
can give this number, and the camera can be quite easily calibrated), 
simply revert to raw counts.  No need to know anything about the camera 
gain used.

> An particle ID field also makes sense. Could you (and other people on 
> this list) go into a bit more detail about how this additional summary 
> information would be used? Is there a significant advantage to storing 
> it as opposed to calculating it as necessary?
>
>     If tracing of localisations into molecules has not been performed then
>     the IDs would be sequential from 1 and the Conversion for the ID
>     set to
>     1. Note that others may have a different take on the use of an ID
>     column. For example it could be used to record the unique ID of a
>     fitting candidate. In this case it would not be sequential as
>     candidates
>     that were rejected will be omitted from the results.
>
>
> Sounds reasonable.
>
>     Also note that results may be written directly by a parallel
>     processing
>     fitting routine. In such a case they may not necessarily be ordered by
>     ID or by time T, and the bounds for some dimensions (ID, T, Z,
>     Intensity) may not be available when the header is written. In
>     this case
>     the bounds can be optional. The bounds for XY should be available
>     given
>     the image from the camera is a fixed size, however it is more flexible
>     if these are optional too.
>

There could be a facility for writing the header after the fact, so that 
the bounds can always be written.  Seems that should be part of the 
library that is going write these files.
>
>     How the header is composed is open to debate. It could be XML allowing
>     verbose descriptions of the data columns. Or a simple tabular format
>     like the example shown above. But I recommend it should be easy to
>     detect when the header has ended and the rows of data records begin.
>

No XML pretty please!  Speed of parsing should be of prime concern in 
this choice (as data-sets can be gigantic and will need to be parsed 
fast), with speed of writing second.  In fact, it makes a lot of sense 
to store the actual localizations in a binary format.
> If anyone else has comments on the format we're working towards you 
> should speak up soon!
Voila!

Best,

Nico

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20150930/d18ebc3f/attachment.html>