On 01/20/2017 03:25 AM, Takeshi Abe wrote:
Preparing a patch for tdf#105382 [1], I come across a question about
character encoding for the path part of a URL representing a's location.
I wonder if the original (before percent-encoded) path of such a URL can
be in an encoding other than UTF-8 or even in a different charset due
to e.g. a code page of some legacy filesystems.
Is it possible?
And, if so, is there any reasonable way to tell the encoding?
A conforming URL itself, by definition, is written with a subset of 
ASCII-only characters.
For file URLs, there never was a definition how to interpret the octets 
encoded in the URL's path component, so OOo/LO came up with the 
convention of always interpreting those as UTF-8.  (So any code that 
converts between file URLs and native pathnames needs to do that mapping 
between UTF-8 and the relevant native pathname encoding, which LO 
assumes to be as reported by osl_getThreadTextEncoding.)


