From thangalin@gmail.com Wed Mar 30 09:32:26 2022 From: Thangalin To: ntg-context@ntg.nl Subject: [NTG-context] Typesetting unicode characters Date: Wed, 30 Mar 2022 00:32:11 -0700 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3995030034469946063==" --===============3995030034469946063== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Hi list, An XML document includes the πŸ‘ emoji, as shown in the following snippet:

Thumbs up emoji: ��

The document is typeset using ConTeXt, but the thumbs up emoji isn't in the PDF. Neither Noto Emoji nor Open Sans Emoji fonts will render. Does anyone have a minimal example that shows how to typeset such escaped entities? When the emoji is added directly to a document, it works fine: \definefont [TextFontEmoji] [opensansemoji] \starttext \TextFontEmoji{Thumbs up emoji: πŸ‘} \stoptext Is there something special that needs to be set for ConTeXt to interpret the escaped unicode values as an emoji? Thank you! --===============3995030034469946063== Content-Type: text/html Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="attachment.html" MIME-Version: 1.0 PGRpdiBkaXI9Imx0ciI+PGRpdj5IaSBsaXN0LDwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+QW4g WE1MIGRvY3VtZW50IGluY2x1ZGVzIHRoZSDwn5GNIGVtb2ppLCBhcyBzaG93biBpbiB0aGUgZm9s bG93aW5nIHNuaXBwZXQ6PGJyPjwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+Jmx0O2h0bWwmZ3Q7 PGJyPsKgICZsdDtoZWFkJmd0OyZsdDttZXRhIGNoYXJzZXQ9JnF1b3Q7dXRmOCZxdW90Oy8mZ3Q7 Jmx0Oy9oZWFkJmd0Ozxicj7CoCAmbHQ7Ym9keSZndDs8YnI+PC9kaXY+PGRpdj7CoMKgwqAgJmx0 O2RpdiBjbGFzcz0mcXVvdDtidWJibGVyeCZxdW90OyZndDs8YnI+wqDCoMKgwqDCoCAmbHQ7cCZn dDtUaHVtYnMgdXAgZW1vamk6ICZhbXA7IzU1MzU3OyZhbXA7IzU2Mzk3OyZsdDsvcCZndDs8YnI+ wqDCoMKgICZsdDsvZGl2Jmd0OzwvZGl2PjxkaXY+wqAgJmx0Oy9ib2R5PGJyPjwvZGl2PjxkaXY+ Jmx0Oy9odG1sJmd0OzwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+VGhlIGRvY3VtZW50IGlzIHR5 cGVzZXQgdXNpbmcgQ29uVGVYdCwgYnV0IHRoZSB0aHVtYnMgdXAgZW1vamkgaXNuJiMzOTt0IGlu IHRoZSBQREYuIE5laXRoZXIgTm90byBFbW9qaSBub3IgT3BlbiBTYW5zIEVtb2ppIGZvbnRzIHdp bGwgcmVuZGVyLjxicj48L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2PkRvZXMgYW55b25lIGhhdmUg YSBtaW5pbWFsIGV4YW1wbGUgdGhhdCBzaG93cyBob3cgdG8gdHlwZXNldCBzdWNoIGVzY2FwZWQg ZW50aXRpZXM/PC9kaXY+PGRpdj48YnI+PC9kaXY+PGRpdj5XaGVuIHRoZSBlbW9qaSBpcyBhZGRl ZCBkaXJlY3RseSB0byBhIGRvY3VtZW50LCBpdCB3b3JrcyBmaW5lOjxicj48L2Rpdj48ZGl2Pjxi cj48L2Rpdj5cZGVmaW5lZm9udCBbVGV4dEZvbnRFbW9qaV0gW29wZW5zYW5zZW1vamldPGJyPjxi cj5cc3RhcnR0ZXh0PGJyPsKgIFxUZXh0Rm9udEVtb2ppe1RodW1icyB1cCBlbW9qaTog8J+RjX08 YnI+PGRpdj5cc3RvcHRleHQ8L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2PklzIHRoZXJlIHNvbWV0 aGluZyBzcGVjaWFsIHRoYXQgbmVlZHMgdG8gYmUgc2V0IGZvciBDb25UZVh0IHRvIGludGVycHJl dCB0aGUgZXNjYXBlZCB1bmljb2RlIHZhbHVlcyBhcyBhbiBlbW9qaT88L2Rpdj48ZGl2Pjxicj48 L2Rpdj48ZGl2PlRoYW5rIHlvdSE8YnI+PC9kaXY+PC9kaXY+Cg== --===============3995030034469946063==-- From arthur.reutenauer@normalesup.org Wed Mar 30 09:48:22 2022 From: Arthur Rosendahl To: ntg-context@ntg.nl Subject: Re: [NTG-context] Typesetting unicode characters Date: Wed, 30 Mar 2022 09:48:19 +0200 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3338123455938593762==" --===============3338123455938593762== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit On Wed, Mar 30, 2022 at 12:32:11AM -0700, Thangalin via ntg-context wrote: > An XML document includes the πŸ‘ emoji, as shown in the following snippet: > > > > >
>

Thumbs up emoji: ��

Try the correct escape sequence :-) That’s 👍 -- or equivalently 👍 Best, Arthur --===============3338123455938593762==-- From thangalin@gmail.com Thu Mar 31 10:06:41 2022 From: Thangalin To: ntg-context@ntg.nl Subject: Re: [NTG-context] Typesetting unicode characters Date: Thu, 31 Mar 2022 01:06:27 -0700 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7722654517126203550==" --===============7722654517126203550== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit On the rare chance that someone else stumbles across this problem ... By default, Java's Xalan transformer for creating XML documents does not correctly encode emojis. Instead of 👍 for the thumbs up emoji, Xalan encodes it as ��. As Arthur pointed out, this is not a valid entity encoding. One solution is to use Saxonica's Saxon 11 transformer, which produces the expected output:

the πŸ‘ emoji

In Java, switching to Saxon entails installing the Jar files for Saxonica and its resolvers. Then set the system property before invoking the XML transformer: System.setProperty( "javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl" ); ConTeXt handles the emoji from the transformed XML file without any issues. Thank you, Arthur. --===============7722654517126203550== Content-Type: text/html Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="attachment.html" MIME-Version: 1.0 PGRpdiBkaXI9Imx0ciI+PGRpdj5PbiB0aGUgcmFyZSBjaGFuY2UgdGhhdCBzb21lb25lIGVsc2Ug c3R1bWJsZXMgYWNyb3NzIHRoaXMgcHJvYmxlbSAuLi48L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2 PkJ5IGRlZmF1bHQsIEphdmEmIzM5O3MgWGFsYW4gdHJhbnNmb3JtZXIgZm9yIGNyZWF0aW5nIFhN TCBkb2N1bWVudHMgZG9lcyBub3QgY29ycmVjdGx5IGVuY29kZSBlbW9qaXMuIEluc3RlYWQgb2Yg JmFtcDsjeDFGNDREOyBmb3IgdGhlIHRodW1icyB1cCBlbW9qaSwgWGFsYW4gZW5jb2RlcyBpdCBh cyAmYW1wOyM1NTM1NzsmYW1wOyM1NjM5NzsuIEFzIEFydGh1ciBwb2ludGVkIG91dCwgdGhpcyBp cyBub3QgYSB2YWxpZCBlbnRpdHkgZW5jb2RpbmcuPC9kaXY+PGRpdj48YnI+PC9kaXY+PGRpdj5P bmUgc29sdXRpb24gaXMgdG8gdXNlIFNheG9uaWNhJiMzOTtzIFNheG9uIDExIHRyYW5zZm9ybWVy LCB3aGljaCBwcm9kdWNlcyB0aGUgZXhwZWN0ZWQgb3V0cHV0OjwvZGl2PjxkaXY+PGJyPjwvZGl2 PjxkaXY+wqAgJmx0O2h0bWwmZ3Q7PGJyPsKgwqDCoCAmbHQ7aGVhZCZndDsmbHQ7bWV0YSBjaGFy c2V0PSZxdW90O3V0ZjgmcXVvdDsvJmd0OyZsdDsvaGVhZCZndDs8L2Rpdj48ZGl2PsKgwqDCoCAm bHQ7Ym9keSZndDs8YnI+wqDCoMKgwqDCoCAmbHQ7cCBpZD0mcXVvdDtjYXJldCZxdW90OyZndDt0 aGUg8J+RjSBlbW9qaSZsdDsvcCZndDs8L2Rpdj48ZGl2PsKgwqDCoCAmbHQ7L2JvZHkmZ3Q7PGJy PsKgICZsdDsvaHRtbCZndDs8L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2PkluIEphdmEsIHN3aXRj aGluZyB0byBTYXhvbiBlbnRhaWxzIGluc3RhbGxpbmcgdGhlIEphciBmaWxlcyBmb3IgU2F4b25p Y2EgYW5kIGl0cyByZXNvbHZlcnMuIFRoZW4gc2V0IHRoZSBzeXN0ZW0gcHJvcGVydHkgYmVmb3Jl IGludm9raW5nIHRoZSBYTUwgdHJhbnNmb3JtZXI6IFN5c3RlbS5zZXRQcm9wZXJ0eSggJnF1b3Q7 amF2YXgueG1sLnRyYW5zZm9ybS5UcmFuc2Zvcm1lckZhY3RvcnkmcXVvdDsswqAgJnF1b3Q7bmV0 LnNmLnNheG9uLlRyYW5zZm9ybWVyRmFjdG9yeUltcGwmcXVvdDsgKTs8YnI+PC9kaXY+PGRpdj48 YnI+PC9kaXY+PGRpdj5Db25UZVh0IGhhbmRsZXMgdGhlIGVtb2ppIGZyb20gdGhlIHRyYW5zZm9y bWVkIFhNTCBmaWxlIHdpdGhvdXQgYW55IGlzc3Vlcy48YnI+PC9kaXY+PGRpdj48YnI+PC9kaXY+ VGhhbmsgeW91LCBBcnRodXIuPGJyPjwvZGl2Pgo= --===============7722654517126203550==--