<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <br>

    <div class="moz-cite-prefix">On 9/30/15 12:03 PM, Simon Li wrote:<br>

    </div>

    <blockquote

cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <div dir="ltr">

        <div>

          <div>Hi Alex<br>

            <br>

          </div>

          Intensity and TXYZ would require units. A factor should be

          provided to<br>

        </div>

        <div>

          <div>

            <div>

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    convert the given units to standard (e.g. seconds,

                    nm, photons). This<br>

                    would allow the format to record using units of

                    choice for the software.<br>

                  </blockquote>

                  <div><br>

                  </div>

                  <div>That makes sense. However, for our first version

                    how about we stick to one unit for simplicity. nm?

                    pixels? Hopefully the original image contains enough

                    metadata to convert between units.<br>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    I second the need for units from the very beginning.  I also think

    that it is important that the dataset can be interpreted without any

    need for the original image metadata.<br>

    <br>

    <blockquote

cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    It would be good to allow for storing the bounds of

                    the data. This is<br>

                    relevant to rendering images, density analysis and

                    other data<br>

                    post-processing.<br>

                  </blockquote>

                  <div><br>

                  </div>

                  <div>I think most of us assumed the localisation data

                    would be stored alongside the source images which

                    would provide some (most?) of this information.<br>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    Why that assumption?  One of the strengths of having a "universal"

    data format for localization data is that the localization data can

    be completely uncoupled from the images themselves, and can be

    visualized/analyzed without need for the original image data (which

    would be only needed if one would want to re-execute the fitting). 

    It seems quite important to me that the format can be used without

    the need for any extraneous data.<br>

    <br>

    <blockquote

cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    To make it possible to link localisations together

                    (i.e. to mark the<br>

                    same molecule in different frames) would require an

                    ID field. These<br>

                    requirement make the file need both a header and

                    records. It could look<br>

                    something like this:<br>

                    <br>

                    Header:<br>

                    <br>

                    Dimension,Min,Max,Unit,Conversion<br>

                    ID,1,59,Count,2.34<br>

                    T,1,1000,Frames,0.25<br>

                    X,0,64,Pixels,107<br>

                    Y,0,64,Pixels,107<br>

                    Z,-500,500,nm,1<br>

                    Intensity,10.987,1234,ADUs,0.025<br>

                    <br>

                    This header corresponds to an image with 59

                    molecules, with an average<br>

                    of 2.34 localisations per molecule (117

                    localisations in total). The<br>

                    image was taken for 1000 frames at a frame rate of

                    250 milliseconds,<br>

                    with 107nm pixel pitch on a 64x64 pixel camera,

                    500nm depth of field for<br>

                    3D fitting, with a gain of 40.<br>

                  </blockquote>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    The "Conversion" field is a bit funny.  It seems to imply that there

    is a "gold standard" unit, which itself is implicit.  They appear to

    be:<br>

    ID: average localizations per molecule (why is that number coupled

    to the unit-less "ID"?)<br>

    T: seconds<br>

    X: nm<br>

    Y: nm<br>

    Z: nm<br>

    Intensity: ???<br>

    <br>

    At the very least, add an extra field stating what the real unit is

    for each field.  Even better, ask the file writer to carry out the

    conversion and only list the unit in which the data are actually

    saved.  Define one or two acceptable units for each field so that

    code writers interpreting the data know what to expect and can

    handle data accordingly.<br>

    <br>

    AS for Intensity, I am confused by the units as presented.  Gain as

    in what?  Linear EM gain of the camera?  It would by far be nicest

    to store intensity data in number of photons detected.  If that

    number is not available (which would be strange/sloppy, since all

    modern EM cameras can give this number, and the camera can be quite

    easily calibrated), simply revert to raw counts.  No need to know

    anything about the camera gain used.<br>

    <br>

    <blockquote

cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div class="gmail_extra">

                <div class="gmail_quote">An particle ID field also makes

                  sense. Could you (and other people on this list) go

                  into a bit more detail about how this additional

                  summary information would be used? Is there a

                  significant advantage to storing it as opposed to

                  calculating it as necessary?<br>

                  <div>

                    <br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    If tracing of localisations into molecules has not

                    been performed then<br>

                    the IDs would be sequential from 1 and the

                    Conversion for the ID set to<br>

                    1. Note that others may have a different take on the

                    use of an ID<br>

                    column. For example it could be used to record the

                    unique ID of a<br>

                    fitting candidate. In this case it would not be

                    sequential as candidates<br>

                    that were rejected will be omitted from the results.<br>

                  </blockquote>

                  <div><br>

                  </div>

                  <div>Sounds reasonable.<br>

                  </div>

                  <div> </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    Also note that results may be written directly by a

                    parallel processing<br>

                    fitting routine. In such a case they may not

                    necessarily be ordered by<br>

                    ID or by time T, and the bounds for some dimensions

                    (ID, T, Z,<br>

                    Intensity) may not be available when the header is

                    written. In this case<br>

                    the bounds can be optional. The bounds for XY should

                    be available given<br>

                    the image from the camera is a fixed size, however

                    it is more flexible<br>

                    if these are optional too.<br>

                  </blockquote>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    There could be a facility for writing the header after the fact, so

    that the bounds can always be written.  Seems that should be part of

    the library that is going write these files.

    <blockquote

cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div> </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    How the header is composed is open to debate. It

                    could be XML allowing<br>

                    verbose descriptions of the data columns. Or a

                    simple tabular format<br>

                    like the example shown above. But I recommend it

                    should be easy to<br>

                    detect when the header has ended and the rows of

                    data records begin.<br>

                  </blockquote>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    No XML pretty please!  Speed of parsing should be of prime concern

    in this choice (as data-sets can be gigantic and will need to be

    parsed fast), with speed of writing second.  In fact, it makes a lot

    of sense to store the actual localizations in a binary format.<br>

    <blockquote

cite="mid:CAMvbRBFZf-czBT=47cpFCNJkD8WPicZUGn6qPn8XGMyDpuQtvw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div>

                    If anyone else has comments on the format we're

                    working towards you should speak up soon!<br>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    Voila!<br>

    <br>

    Best,<br>

    <br>

    Nico<br>

    <br>

    <br>

  </body>

</html>