<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 9/30/15 12:03 PM, Simon Li wrote:<br>
</div>
<blockquote
cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div dir="ltr">
<div>
<div>Hi Alex<br>
<br>
</div>
Intensity and TXYZ would require units. A factor should be
provided to<br>
</div>
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
convert the given units to standard (e.g. seconds,
nm, photons). This<br>
would allow the format to record using units of
choice for the software.<br>
</blockquote>
<div><br>
</div>
<div>That makes sense. However, for our first version
how about we stick to one unit for simplicity. nm?
pixels? Hopefully the original image contains enough
metadata to convert between units.<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
I second the need for units from the very beginning. I also think
that it is important that the dataset can be interpreted without any
need for the original image metadata.<br>
<br>
<blockquote
cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
It would be good to allow for storing the bounds of
the data. This is<br>
relevant to rendering images, density analysis and
other data<br>
post-processing.<br>
</blockquote>
<div><br>
</div>
<div>I think most of us assumed the localisation data
would be stored alongside the source images which
would provide some (most?) of this information.<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
Why that assumption? One of the strengths of having a "universal"
data format for localization data is that the localization data can
be completely uncoupled from the images themselves, and can be
visualized/analyzed without need for the original image data (which
would be only needed if one would want to re-execute the fitting).
It seems quite important to me that the format can be used without
the need for any extraneous data.<br>
<br>
<blockquote
cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
To make it possible to link localisations together
(i.e. to mark the<br>
same molecule in different frames) would require an
ID field. These<br>
requirement make the file need both a header and
records. It could look<br>
something like this:<br>
<br>
Header:<br>
<br>
Dimension,Min,Max,Unit,Conversion<br>
ID,1,59,Count,2.34<br>
T,1,1000,Frames,0.25<br>
X,0,64,Pixels,107<br>
Y,0,64,Pixels,107<br>
Z,-500,500,nm,1<br>
Intensity,10.987,1234,ADUs,0.025<br>
<br>
This header corresponds to an image with 59
molecules, with an average<br>
of 2.34 localisations per molecule (117
localisations in total). The<br>
image was taken for 1000 frames at a frame rate of
250 milliseconds,<br>
with 107nm pixel pitch on a 64x64 pixel camera,
500nm depth of field for<br>
3D fitting, with a gain of 40.<br>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
The "Conversion" field is a bit funny. It seems to imply that there
is a "gold standard" unit, which itself is implicit. They appear to
be:<br>
ID: average localizations per molecule (why is that number coupled
to the unit-less "ID"?)<br>
T: seconds<br>
X: nm<br>
Y: nm<br>
Z: nm<br>
Intensity: ???<br>
<br>
At the very least, add an extra field stating what the real unit is
for each field. Even better, ask the file writer to carry out the
conversion and only list the unit in which the data are actually
saved. Define one or two acceptable units for each field so that
code writers interpreting the data know what to expect and can
handle data accordingly.<br>
<br>
AS for Intensity, I am confused by the units as presented. Gain as
in what? Linear EM gain of the camera? It would by far be nicest
to store intensity data in number of photons detected. If that
number is not available (which would be strange/sloppy, since all
modern EM cameras can give this number, and the camera can be quite
easily calibrated), simply revert to raw counts. No need to know
anything about the camera gain used.<br>
<br>
<blockquote
cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">An particle ID field also makes
sense. Could you (and other people on this list) go
into a bit more detail about how this additional
summary information would be used? Is there a
significant advantage to storing it as opposed to
calculating it as necessary?<br>
<div>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
If tracing of localisations into molecules has not
been performed then<br>
the IDs would be sequential from 1 and the
Conversion for the ID set to<br>
1. Note that others may have a different take on the
use of an ID<br>
column. For example it could be used to record the
unique ID of a<br>
fitting candidate. In this case it would not be
sequential as candidates<br>
that were rejected will be omitted from the results.<br>
</blockquote>
<div><br>
</div>
<div>Sounds reasonable.<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Also note that results may be written directly by a
parallel processing<br>
fitting routine. In such a case they may not
necessarily be ordered by<br>
ID or by time T, and the bounds for some dimensions
(ID, T, Z,<br>
Intensity) may not be available when the header is
written. In this case<br>
the bounds can be optional. The bounds for XY should
be available given<br>
the image from the camera is a fixed size, however
it is more flexible<br>
if these are optional too.<br>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
There could be a facility for writing the header after the fact, so
that the bounds can always be written. Seems that should be part of
the library that is going write these files.
<blockquote
cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
How the header is composed is open to debate. It
could be XML allowing<br>
verbose descriptions of the data columns. Or a
simple tabular format<br>
like the example shown above. But I recommend it
should be easy to<br>
detect when the header has ended and the rows of
data records begin.<br>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
No XML pretty please! Speed of parsing should be of prime concern
in this choice (as data-sets can be gigantic and will need to be
parsed fast), with speed of writing second. In fact, it makes a lot
of sense to store the actual localizations in a binary format.<br>
<blockquote
cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div>
If anyone else has comments on the format we're
working towards you should speak up soon!<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
Voila!<br>
<br>
Best,<br>
<br>
Nico<br>
<br>
<br>
</body>
</html>