On Tue, May 21, 2013 at 10:01:30AM +0200, Hans Hagen wrote:
On 5/20/2013 11:36 PM, Khaled Hosny wrote:
On Mon, May 20, 2013 at 11:22:41PM +0200, Khaled Hosny wrote:
My suggestion is to just use \phys_units_text_* always, since the decomposed, two characters is the preferred form for those two units.
“In normal use, it is better to represent degrees Celsius ‘°C’ with a sequence of U+00B0 DEGREE SIGN + U+0043 LATIN CAPITAL LETTER C, rather than U+2103 DEGREE CELSIUS. For searching, treat these two sequences as identical. Similarly, the sequence U+00B0 DEGREE SIGN + U+0046 LATIN CAPITAL LETTER F is preferred over U+2109 DEGREE FAHRENHEIT, and those two sequences should be treated as identical for searching.”
http://www.unicode.org/versions/Unicode6.2.0/ch15.pdf#G20445
Searching should be ok due to the tonunicode that mentions the two characters ... how about the visual aspect? Should we care about?
I think for ConTeXt purposes we should just ignore the composed form, it is there only for compatibility with some legacy CJK encodings and their use is almost discouraged. If user enters those code points directly, then he on his own, so nothing to be done here either. In short, this is just some of the compatibility nonsense crippling Unicode, we are better off pretending they do not exist Regards, Khaled