On Mon, Jun 7, 2021 at 7:05 PM Hans Hagen <j.hagen@xs4all.nl> wrote:
On 6/3/2021 11:25 AM, Christoph Reller wrote:
> Hi,
>
> On Windows, we have the consola font. Consider the MWE:
>
> \starttext
> \definedfont[name:consola*default at 12 pt]
> -
> \stoptext
>
> The output PDF is correctly generated with recent versions of ConTeXt
> LMTX. The hyphen is, however, mapped to a soft hyphen
> <https://unicode-table.com/en/00AD/> by means of the ToUnicode table
> which contains:
>      beginbfchar
>          <015E> <00AD>
>      endbfchar
>
> Consequently, when copying the text from the PDF and pasting in an
> editor or a console, the soft hyphen is pasted.
>
> I would like to change the ToUnicode information to an ordinary
> hyphen-minus <https://unicode-table.com/en/002D/>:
>      beginbfchar
>          <015E> <002D>
>      endbfchar
>
It is (as awlways with fonts) more complex than that (1) because
different unicode slots share the same shape and (2) we have some
(already) old hyphen patching code for messy fonts (which is kind of bad
anyway).

We actually want all these hyphens to have the right tounicode even if
they share shapes (i already had some comment about looking into that
but never ran into a font that needed it).

So, after some experimenting i decided to solve that in a different way
(lmtx only because there i have more control) ... i need to run some
checks and then do an upload so that you can test (also other files if
possible).
 
Finally I found the time to do some extended testing on this and it seems that for my use-case the LMTX version 2021-06-09 behaves as I would expect: Hyphens are now extracted as hyphens.

Thanks a lot for your implementation, Hans!

Cheers,
Christoph