[ome-devel] Simple flat binary file format

Kevin Mader kevin.mader at gmail.com
Thu Mar 26 17:14:40 GMT 2015


So I am trying to come up with a good, simple binary file-format that works
well with 'Big Data' platforms like Hadoop, Spark, and S3 (see issue
https://github.com/scifio/scifio/issues/265). The idea is to keep the
storage as simple as possible and the first implementation of such a format
is shown here
https://github.com/thunder-project/thunder/tree/master/python/thunder/utils/data/fish/series
It consists of the binary file accompanied by a conf.json file with the
following contents

{
  "valuetype": "uint8",
  "nkeys": 3,
  "keytype": "int16",
  "dims": [
    76,
    87,
    2
  ],
  "nvalues": 240,
  "input": "key02_00000-key01_00000-key00_00000.bin"
}

Since Big Data platforms normally work with key-value pairs the idea would
be to have a key consisting of several numbers (nkeys) of type (keytype)
and then a value as an array of type (valuetype) with dimensions (dims) and
all of this spread into multiple files so they can be easily written and
read in parallel (or on different machines to a shared file system).

Does anyone have any suggestions for making a simple format around this?
The best case would be to have something that could be easily read into or
written from ImageJ, Matlab, Python, or whatever other tool is around with
just a few lines of code and no dependencies.

Thanks
Kevin


-- 
----
Kevin Mader
Mobile : +41 (0)78 755 14 38
Office (PSI) : +41 (0)56 310 58 53
Office (ETH) : +41 (0)44 633 61 86
Home : +1 (503) 610-8754
WBBA 213
5232 Villigen PSI
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20150326/5932e8bf/attachment-0001.html>


More information about the ome-devel mailing list