Modify ToUnicode with Goodies
Hi, On Windows, we have the consola font. Consider the MWE: \starttext \definedfont[name:consola*default at 12 pt] - \stoptext The output PDF is correctly generated with recent versions of ConTeXt LMTX. The hyphen is, however, mapped to a soft hyphen https://unicode-table.com/en/00AD/ by means of the ToUnicode table which contains: beginbfchar <015E> <00AD> endbfchar Consequently, when copying the text from the PDF and pasting in an editor or a console, the soft hyphen is pasted. I would like to change the ToUnicode information to an ordinary hyphen-minus https://unicode-table.com/en/002D/: beginbfchar <015E> <002D> endbfchar I have tried with a goodies file, and an updated MWE: --- 8< ------------------------------------------ return { name = "consola", version = "1.00", comment = "", author = "", copyright = "", remapping = { tounicode = true, unicodes = { hyphen = 0x002D, }, }, } --- 8< ------------------------------------------ \definefontfeature[consola][mode=base, goodies=consola, unicoding=yes] \starttypescript[mono][consolas] \definefontsynonym[ConsolasRegular][file:consola][features=consola] \stoptypescript \starttypescript[mono][consolas] \definefontsynonym[Mono][ConsolasRegular] \stoptypescript \definetypeface[Body][tt][mono][consolas][default] \setupbodyfont[Body, ss, 10pt] \starttext \tt - \stoptext --- 8< ------------------------------------------ Unfortunately, this has no effect. Please tell me how to correctly update ToUnicode information with a goodies file. Cheers, Christoph
On 6/3/2021 11:25 AM, Christoph Reller wrote:
Hi,
On Windows, we have the consola font. Consider the MWE:
\starttext \definedfont[name:consola*default at 12 pt] - \stoptext
The output PDF is correctly generated with recent versions of ConTeXt LMTX. The hyphen is, however, mapped to a soft hyphen https://unicode-table.com/en/00AD/ by means of the ToUnicode table which contains: beginbfchar <015E> <00AD> endbfchar
Consequently, when copying the text from the PDF and pasting in an editor or a console, the soft hyphen is pasted.
I would like to change the ToUnicode information to an ordinary hyphen-minus https://unicode-table.com/en/002D/: beginbfchar <015E> <002D> endbfchar
I have tried with a goodies file, and an updated MWE:
--- 8< ------------------------------------------ return { name = "consola", version = "1.00", comment = "", author = "", copyright = "", remapping = { tounicode = true, unicodes = { hyphen = 0x002D, }, }, } --- 8< ------------------------------------------ \definefontfeature[consola][mode=base, goodies=consola, unicoding=yes] \starttypescript[mono][consolas] \definefontsynonym[ConsolasRegular][file:consola][features=consola] \stoptypescript \starttypescript[mono][consolas] \definefontsynonym[Mono][ConsolasRegular] \stoptypescript \definetypeface[Body][tt][mono][consolas][default] \setupbodyfont[Body, ss, 10pt]
\starttext \tt - \stoptext --- 8< ------------------------------------------
Unfortunately, this has no effect.
Please tell me how to correctly update ToUnicode information with a goodies file. It is (as awlways with fonts) more complex than that (1) because different unicode slots share the same shape and (2) we have some (already) old hyphen patching code for messy fonts (which is kind of bad anyway).
We actually want all these hyphens to have the right tounicode even if they share shapes (i already had some comment about looking into that but never ran into a font that needed it). So, after some experimenting i decided to solve that in a different way (lmtx only because there i have more control) ... i need to run some checks and then do an upload so that you can test (also other files if possible). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On Mon, Jun 7, 2021 at 7:05 PM Hans Hagen
On 6/3/2021 11:25 AM, Christoph Reller wrote:
Hi,
On Windows, we have the consola font. Consider the MWE:
\starttext \definedfont[name:consola*default at 12 pt] - \stoptext
The output PDF is correctly generated with recent versions of ConTeXt LMTX. The hyphen is, however, mapped to a soft hyphen https://unicode-table.com/en/00AD/ by means of the ToUnicode table which contains: beginbfchar <015E> <00AD> endbfchar
Consequently, when copying the text from the PDF and pasting in an editor or a console, the soft hyphen is pasted.
I would like to change the ToUnicode information to an ordinary hyphen-minus https://unicode-table.com/en/002D/: beginbfchar <015E> <002D> endbfchar
It is (as awlways with fonts) more complex than that (1) because different unicode slots share the same shape and (2) we have some (already) old hyphen patching code for messy fonts (which is kind of bad anyway).
We actually want all these hyphens to have the right tounicode even if they share shapes (i already had some comment about looking into that but never ran into a font that needed it).
So, after some experimenting i decided to solve that in a different way (lmtx only because there i have more control) ... i need to run some checks and then do an upload so that you can test (also other files if possible).
Finally I found the time to do some extended testing on this and it seems that for my use-case the LMTX version 2021-06-09 behaves as I would expect: Hyphens are now extracted as hyphens. Thanks a lot for your implementation, Hans! Cheers, Christoph
participants (2)
-
Christoph Reller
-
Hans Hagen