[ome-users] Interpretation of special characters

Wed Jan 28 10:25:24 GMT 2015

Dear Bio-Formats developers,

I have a slightly challenging question, at least it is so for me :-)
We have images where users can write a description into a free-text
field. Sometimes, users somehow manage to create special characters
in such fields, probably by copy-pasting from some word-document
into MetaXpress.

In the following image, you can find such a case. The character that
the user seems to have entered should be decimal 150, hex 96. At least
this is what I get from a hex editor when checking the file. When I
inspect the image meta data with Matlab, the character is reported
as "0xFF96", which seems at least to make it possible to recover the
original value by using only the lower byte.

With Bio-Formats, I did not manage to recover the original value.
Instead, the character is reported as decimal 65533, which I learned
to be an UTF-special for "character could not be parsed"(?) I tried
all sorts of setting a correct locale, so far without success. Can
you please advise: is it even supported to handle such cases? Should
I be able to read the characters identical to the bytes in the file
with Bio-Formats? How can I set the correct locale for Matlab and/or
Java? I assume we have de_CH.something here...

The image I used is here:
  http://data.marssoft.de/b0bCDac-T130_wB09_s25_z0_t1_cRFP_u001.tif

Rough instructions to reproduce:
  aFileName = '/tmp/b0bCDac-T130_wB09_s25_z0_t1_cRFP_u001.tif';
  javaaddpath('.../bioformats_package.jar');
  vTiffParser = loci.formats.tiff.TiffParser(aFileName);
  vString = vTiffParser.getComment();
  uint32(vString.charAt(93))
  # this returns 65533 for me, not (dec) 150 / (hex) 96

All the best, and thanks for your great work,

     Mario Emmenlauer

-- 
Mario Emmenlauer BioDataAnalysis             Mobil: +49-(0)151-68108489
Balanstrasse 43                    mailto: mario.emmenlauer * unibas.ch
D-81669 München                          http://www.marioemmenlauer.de/