[ome-users] OME-TIFF: problem with the "micron" character (micrometer unit)

Curtis Rueden ctrueden at wisc.edu
Wed Sep 6 14:42:36 BST 2017


Hi Roger,

> The problem lies with the behaviour of Java on Windows.

Nice detective work. :-)

I am curious: do you know whether it works with:

* The PowerShell console
* A Cygwin terminal
* An MSYS (e.g., Git Bash) terminal

?

Regards,
Curtis

--
Curtis Rueden
LOCI software architect - https://loci.wisc.edu/software
ImageJ2 lead, Fiji maintainer - https://imagej.net/User:Rueden


On Wed, Sep 6, 2017 at 7:57 AM, Roger Leigh <rleigh at dundee.ac.uk> wrote:

> On 01/09/17 20:30, Christoph Gohlke wrote:
>
>> one issue is that the tiffcomment utility outputs XML that is not well
>> formed. OME-XML should be UTF-8 encoded, but tiffcomment apparently
>> encodes with latin1, iso-8859-1, or similar (Bioformats 5.6.0 on Windows
>> 10).
>> Try re-encoding the XML file (e.g. in Python3 Q&D):
>>
>> xml = open('comment.xml', 'rb').read()
>> xml = xml.decode('iso-8859-1').encode('utf8')
>> open('comment.xml', 'wb').write(xml)
>>
>> Another issue could be that the XML in the ome.tiff file is not encoded
>> correctly. Open the ome.tiff file with a HEX editor. The lower case Mu
>> letter should be stored in two bytes (C2 B5), not just one byte (B5).
>>
>
> The problem lies with the behaviour of Java on Windows.
>
> tiffcomment uses System.out.println() to print the comment to standard
> output, and this uses the default encoding.  On Windows, this is likely
> to be an old 8-bit codepage such as CP1252, which will result in the
> output being recoded from UTF-8 to whatever codepage is in use.  Please
> see
> https://stackoverflow.com/questions/24803733/default-charact
> er-encoding-for-java-console-output
> for further details.
>
> You could try to force the use of UTF-8 by making this change to the
> bf.bat script which is part of bftools:
>
>
> diff --git a/tools/bf.bat b/tools/bf.bat
> index 0c56b79388..6f3146e956 100644
> --- a/tools/bf.bat
> +++ b/tools/bf.bat
> @@ -22,6 +22,14 @@ if "%BF_MAX_MEM%" == "" (
>  )
>  set BF_FLAGS=%BF_FLAGS% -Xmx%BF_MAX_MEM%
>
> +rem Set the file encoding
> +if "%BF_ENCODING%" == "" (
> +  rem Set UTF-8 by default
> +  set BF_ENCODING=UTF-8
> +)
> +set "BF_FLAGS=%BF_FLAGS% -Dfile.encoding=%BF_ENCODING%"
> +
> +
>  rem Skip the update check if the NO_UPDATE_CHECK flag is set.
>  if not "%NO_UPDATE_CHECK%" == "" (
>    set BF_FLAGS=%BF_FLAGS% -Dbioformats_can_do_upgrade_check=false
>
> It's not something which we can enable by default, because this is not a
> setting which is supposed to be used publicly, but it may help in this
> case.
>
> An alternative solution would be to use a Unix platform such as Linux,
> FreeBSD or MacOS X with a UTF-8 locale, where the output will always be
> correctly encoded as UTF-8.
>
> As a better long term solution, we could reopen System.out to use a
> UTF-8 encoding, or to use raw bytes and transfer everything verbatim.
>
>
> Kind regards,
> Roger
>
> --
> Dr Roger Leigh -- Open Microscopy Environment
> Wellcome Trust Centre for Gene Regulation and Expression,
> College of Life Sciences, University of Dundee, Dow Street,
> Dundee DD1 5EH Scotland UK   Tel: (01382) 386364
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
> _______________________________________________
> ome-users mailing list
> ome-users at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20170906/08fdcaa8/attachment.html>


More information about the ome-users mailing list