[ome-users] OME-TIFF: problem with the "micron" character (micrometer unit)

Roger Leigh rleigh at dundee.ac.uk
Thu Sep 7 17:05:58 BST 2017


On 06/09/17 14:42, Curtis Rueden wrote:
> Hi Roger,
>
>  > The problem lies with the behaviour of Java on Windows.
>
> Nice detective work. :-)
>
> I am curious: do you know whether it works with:
>
> * The PowerShell console
> * A Cygwin terminal
> * An MSYS (e.g., Git Bash) terminal

I tested with this sample program:

--✂-----------------------------------
import java.nio.charset.Charset;
import java.io.OutputStreamWriter;
import java.io.ByteArrayOutputStream;

class test
{
     public static void main(String[] args) {
         System.out.println("Default Charset=" + Charset.defaultCharset());
         System.out.println("File Encoding=" +
System.getProperty("file.encoding"));
         System.out.println("Default Charset in Use=" +
getDefaultCharSet());
     }

     private static String getDefaultCharSet() {
         OutputStreamWriter writer = new OutputStreamWriter(new
ByteArrayOutputStream());
         String enc = writer.getEncoding();
         return enc;
     }

}
--✂-----------------------------------

derived from a sample in
https://stackoverflow.com/questions/1749064/how-to-find-the-default-charset-encoding-in-java

Here's the results:

C:\Users\rleigh\Desktop>java -version
openjdk version "1.8.0_121-2-ojdkbuild"
OpenJDK Runtime Environment (build 1.8.0_121-2-ojdkbuild-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)


CMD
---

C:\Users\rleigh\Desktop>java -cp . test
Default Charset=windows-1252
File Encoding=Cp1252
Default Charset in Use=Cp1252

C:\Users\rleigh\Desktop>java -Dfile.encoding=UTF-8 -cp . test
Default Charset=UTF-8
File Encoding=UTF-8
Default Charset in Use=UTF8

POWERSHELL
----------

PS C:\Users\rleigh\Desktop> java -cp . test
Default Charset=windows-1252
File Encoding=Cp1252
Default Charset in Use=Cp1252
PS C:\Users\rleigh\Desktop> java "-Dfile.encoding=UTF-8" -cp . test
Default Charset=UTF-8
File Encoding=UTF-8
Default Charset in Use=UTF8
PS C:\Users\rleigh\Desktop>

CYGWIN
------

rleigh at WIN-T9OAGQ4B85D /cygdrive/c/Users/rleigh/Desktop
$ java -cp . test
Default Charset=windows-1252
File Encoding=Cp1252
Default Charset in Use=Cp1252

rleigh at WIN-T9OAGQ4B85D /cygdrive/c/Users/rleigh/Desktop
$ java -Dfile.encoding=UTF-8 -cp . test
Default Charset=UTF-8
File Encoding=UTF-8
Default Charset in Use=UTF8

MSYS2
-----

rleigh at WIN-T9OAGQ4B85D MSYS /c/Users/rleigh/Desktop
$ /c/Program\ Files/ojdkbuild/java-1.8.0-openjdk-1.8.0.121-2/bin/java
-cp . test
Default Charset=windows-1252
File Encoding=Cp1252
Default Charset in Use=Cp1252

rleigh at WIN-T9OAGQ4B85D MSYS /c/Users/rleigh/Desktop
$ /c/Program\ Files/ojdkbuild/java-1.8.0-openjdk-1.8.0.121-2/bin/java
-Dfile.encoding=UTF-8 test -cp . test
Default Charset=UTF-8
File Encoding=UTF-8
Default Charset in Use=UTF8


So no difference between any of the environments.  In all cases setting
file.encoding will cause UTF-8 to be used as the encoding for all output
streams.  If you don't set it, the default remains CP1252.

Both Cygwin and MSYS2 report (running "locale") that the locale
variables are all en_US.UTF-8, and in both cases LANG=en_US.UTF-8 is set
in the environment.  Clearly, Java on Windows is ignoring this.


Regards,
Roger

--
Dr Roger Leigh -- Open Microscopy Environment
Wellcome Trust Centre for Gene Regulation and Expression,
College of Life Sciences, University of Dundee, Dow Street,
Dundee DD1 5EH Scotland UK   Tel: (01382) 386364

The University of Dundee is a registered Scottish Charity, No: SC015096


More information about the ome-users mailing list