Typesetting unicode characters
Hi list, An XML document includes the ๐ emoji, as shown in the following snippet: <html> <head><meta charset="utf8"/></head> <body> <div class="bubblerx"> <p>Thumbs up emoji: </p> </div>
Hi list, An XML document includes the ๐ emoji, as shown in the following snippet: <html> <head><meta charset="utf8"/></head> <body> <div class="bubblerx"> <p>Thumbs up emoji: </p> </div>
On Wed, Mar 30, 2022 at 12:32:11AM -0700, Thangalin via ntg-context wrote:
An XML document includes the ๐ emoji, as shown in the following snippet:
<html> <head><meta charset="utf8"/></head> <body> <div class="bubblerx"> <p>Thumbs up emoji: </p>
Try the correct escape sequence :-) Thatโs 👍 -- or equivalently 👍 Best, Arthur
On the rare chance that someone else stumbles across this problem ... By default, Java's Xalan transformer for creating XML documents does not correctly encode emojis. Instead of 👍 for the thumbs up emoji, Xalan encodes it as . As Arthur pointed out, this is not a valid entity encoding. One solution is to use Saxonica's Saxon 11 transformer, which produces the expected output: <html> <head><meta charset="utf8"/></head> <body> <p id="caret">the ๐ emoji</p> </body> </html> In Java, switching to Saxon entails installing the Jar files for Saxonica and its resolvers. Then set the system property before invoking the XML transformer: System.setProperty( "javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl" ); ConTeXt handles the emoji from the transformed XML file without any issues. Thank you, Arthur.