composing commands
Hi, As I'm going over the commands in lmtx, I wonder if we should keep \c \d \k \r \u \v \" \' \` \^ etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so. Objections? Hurt feelings? Sentiments? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
I agree with the optional alternative. If LuaMetaTeX goes (c)leaner, it can
get rid of obsolete constructions made for a pre-Unicode world. Some are
already angry with primitives gone and I think that's positive, but it's
only my opinion.
Jairo :)
El vie., 5 de feb. de 2021 11:38 a. m., Hans Hagen
Hi,
As I'm going over the commands in lmtx, I wonder if we should keep
\c \d \k \r \u \v
\" \' \` \^
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net
___________________________________________________________________________________
On 2/5/2021 5:49 PM, Jairo A. del Rio wrote:
I agree with the optional alternative. If LuaMetaTeX goes (c)leaner, it can get rid of obsolete constructions made for a pre-Unicode world. Some are already angry with primitives gone and I think that's positive, but it's only my opinion. I never got saw angry mails here about gone primitives. Context commands seldom go away; in the transition to mkiv some encoding stuff became obsolete and after many years might have been removed from mkiv because no use needed/used them anyway. New stuff gets added, old stuff stays or gets improved.
Primitives are an engine thing and there are differences between engines, for sure. When context overloads primitives (happens in a few cases) the original often is available as \normal<primitive>. There is a core set of primives (original tex, luametatex has dropped some backend related ones and nilled some prefixes that we never used in context anyway), some etex enhancements brought new primitives (some make no sense in the luametex universe but one can always fake something), some auxiliary pdftex primitives never were available in lua(meta)tex because we have lua, and from omega/aleph we ended up with nearly nothing (luametatex dropped some useless direction stuff). So: no real harm done. And then of course luametatex brought some new primitives. When overloadmode is enabled one cannot redefine primitives and/or macros, depending on what property they have gotten (i'm now down to four pages todo). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 2/5/21 5:38 PM, Hans Hagen wrote:
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
No hurt feelings, but I know that in my bib files, there are a couple of old entries that still have these weird composed characters. So I'm fine with upgrading, but it would be nice if this could fail gracefully, with a nice and informative error message... All best Thomas
On 2/5/2021 6:19 PM, Thomas A. Schmitz wrote:
On 2/5/21 5:38 PM, Hans Hagen wrote:
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
No hurt feelings, but I know that in my bib files, there are a couple of old entries that still have these weird composed characters. So I'm fine with upgrading, but it would be nice if this could fail gracefully, with a nice and informative error message...
Is this ok for you?
tex error > tex error on line 6 in file ./oeps.tex: Undefined
control sequence
On 5 Feb 2021, at 22:30, Hans Hagen
[…] Is this ok for you?
tex error > tex error on line 6 in file ./oeps.tex: Undefined control sequence
\v 4 5 \starttext 6 >> \v 7 \stoptext 8
In that case the error message could say
tex error > tex error on line 6 in file ./oeps.tex: Undefined control sequence: if you really mean it, then
tex error > add \usemodule[oldschool] at the begiining of your file…
On 2/5/2021 11:25 PM, Otared Kavian wrote:
On 5 Feb 2021, at 22:30, Hans Hagen
wrote: […] Is this ok for you?
tex error > tex error on line 6 in file ./oeps.tex: Undefined control sequence
\v 4 5 \starttext 6 >> \v 7 \stoptext 8
In that case the error message could say
tex error > tex error on line 6 in file ./oeps.tex: Undefined control sequence: if you really mean it, then tex error > add \usemodule[oldschool] at the begiining of your file…
\v
Sure but then we can as well keep them -) But maybe I will first redo them in a less old school way. The reason i ask mostly relates to trying to classify such commands. It is easy to see that one should not redefine \relax (primitive) or \framed (permanent core macro) so we can protect these against overloading, while \temp is supposed to be mutable (\foo_bar are already hidden from the user to these i can skip). There are (local) cases where we have \3 defined but what should it be otherwise? And commands like \n ? Commands like \NC have a meaning that depend on the specific environment so they change meaning but then still need to be protected against overload. Most is already dealt with so i'm now going through left-overs. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 2/5/21 10:30 PM, Hans Hagen wrote:
Is this ok for you?
tex error > tex error on line 6 in file ./oeps.tex: Undefined control sequence
\v 4 5 \starttext 6 >> \v 7 \stoptext 8
Otared has already replied what I was thinking: would it be possible, for a certain period of time, to not stop compilation, but flag the issue on the console? When you say that such changes won't happen suddenly, this would be a wonderful compromise for me. My consolidated bibtex database began its life many many years ago, when bibtex was still pure 7bit, and you had to pepper your sources with all sorts of silly workarounds in order to get accents and umlauts and whatever you wanted. Those were the days - barefoot through the snow, and it was uphill both ways... Thomas
On 2/6/2021 12:01 AM, Thomas A. Schmitz wrote:
On 2/5/21 10:30 PM, Hans Hagen wrote:
Is this ok for you?
tex error > tex error on line 6 in file ./oeps.tex: Undefined control sequence
\v 4 5 \starttext 6 >> \v 7 \stoptext 8
Otared has already replied what I was thinking: would it be possible, for a certain period of time, to not stop compilation, but flag the issue on the console? When you say that such changes won't happen suddenly, this would be a wonderful compromise for me. My consolidated bibtex database began its life many many years ago, when bibtex was still pure 7bit, and you had to pepper your sources with all sorts of silly workarounds in order to get accents and umlauts and whatever you wanted. Those were the days - barefoot through the snow, and it was uphill both ways...
So how did you do the greek then? I played a bit with an alternative implementation (same commands): less hash and mem, delegate more to lua. That way we can a less cost issue such a message (once only of course). There is of course a price to pay % .16 sec per 100000 \"u : old method (more mkii-ish) % .25 sec per 100000 \"u : new method (more lmtx-ish) (actually with mkii in pdftex we need .3 seconds, xetex freezes with 100K but needs 0.53 for 10K, luatex needs 0.18) but I'm sure you don't care much about that so I just implements a variant with warning which takes .19 seconds per 100K so it's a nice compromise. (Probably less than half of that time on your machine.) system > instead of old school '\"u' you can input the utf sequence ü (The old school narative is that context is slow, which is why we need to keep an eye on performance, right?) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 2/5/2021 6:19 PM, Thomas A. Schmitz wrote:
On 2/5/21 5:38 PM, Hans Hagen wrote:
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
No hurt feelings, but I know that in my bib files, there are a couple of old entries that still have these weird composed characters. So I'm fine with upgrading, but it would be nice if this could fail gracefully, with a nice and informative error message...
btw, something like that will never happen suddenly ... more a matter of declaring them obsolete, maybe move them so a module that we could still load by default and later maybe on demand concerning bibtex files, that's another story ... in order to deal well with sorting etc quite a bit of sanitizing already takes place there (alan and i spent quite a bit of time on that); also there are ways to define extra only-used-in-bibtex commands, so we could actually just define them for bib stuff only it's more about "what are the current habits" ... we have commands like \" (which is kind of intuitive) but \r and \v and such fall in the category, and there are more kind of accents than we currently have commands for anyway a similar discussion (and we already exchanged some mails about that) are named glyphs ... we have quite some for latin, greek, cyrillic (like \eacute) but how about the rest of unicode makes me wonder if we should have \chr{e"a'} producing ëá (using real combinings is ok already) which is trivial to implement so for the fun of it i might as well add that; i think most who deal with languages that have characters other than ascii will input in the most natural way so we're only talking of escapes for those who see accents and such as noise (yes we do have accents in dutch) (in mkii we already had utf so then we actually did much of the transition but mkii is stone age in terms of software) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen said:
makes me wonder if we should have
\chr{e"a'}
producing
ëá
(using real combinings is ok already) which is trivial to implement so
for the fun of it i might as well add that
I think that it would be useful. I use Unicode characters extensively in
my ConTeXt input, but only because I edit it in Emacs and can set up
keymaps that map to the Unicode characters in a way that I can actually
remember. I think that this would add an easily remembered way for people
to add combining characters to their documents. Sometimes a slightly more
verbose way to do something is helpful when it is more easily remembered.
(Honestly, I can't remember the hex codes for any Unicode characters after
you get out of the range that maps to plain ASCII
On Fri, Feb 5, 2021 at 5:29 PM Hans Hagen
On 2/5/2021 6:19 PM, Thomas A. Schmitz wrote:
On 2/5/21 5:38 PM, Hans Hagen wrote:
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
No hurt feelings, but I know that in my bib files, there are a couple of old entries that still have these weird composed characters. So I'm fine with upgrading, but it would be nice if this could fail gracefully, with a nice and informative error message...
btw, something like that will never happen suddenly ... more a matter of declaring them obsolete, maybe move them so a module that we could still load by default and later maybe on demand
concerning bibtex files, that's another story ... in order to deal well with sorting etc quite a bit of sanitizing already takes place there (alan and i spent quite a bit of time on that); also there are ways to define extra only-used-in-bibtex commands, so we could actually just define them for bib stuff only
it's more about "what are the current habits" ... we have commands like \" (which is kind of intuitive) but \r and \v and such fall in the category, and there are more kind of accents than we currently have commands for anyway
a similar discussion (and we already exchanged some mails about that) are named glyphs ... we have quite some for latin, greek, cyrillic (like \eacute) but how about the rest of unicode
makes me wonder if we should have
\chr{e"a'}
producing
ëá
(using real combinings is ok already) which is trivial to implement so for the fun of it i might as well add that; i think most who deal with languages that have characters other than ascii will input in the most natural way so we're only talking of escapes for those who see accents and such as noise (yes we do have accents in dutch)
(in mkii we already had utf so then we actually did much of the transition but mkii is stone age in terms of software)
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net
___________________________________________________________________________________
-- T. Kurt Bond, tkurtbond@gmail.com, https://tkurtbond.github.io
On 2/6/2021 11:41 PM, T. Kurt Bond wrote:
I think that it would be useful. I use Unicode characters extensively in my ConTeXt input, but only because I edit it in Emacs and can set up keymaps that map to the Unicode characters in a way that I can actually remember. I think that this would add an easily remembered way for people to add combining characters to their documents. Sometimes a slightly more verbose way to do something is helpful when it is more easily remembered. (Honestly, I can't remember the hex codes for any Unicode characters after you get out of the range that maps to plain ASCII I anyway uprgade this mechanism. First of all, the short commands will be equivalents to more verbose ones.
\withgrave {a} == \\`{a} \withacute {a} == \\'{a} \withcircumflex {a} == \\^{a} \withtilde {a} == \\~{a} \withmacron {a} == \\={a} \withbreve {e} == \\u{e} \withdot {c} == \\.{c} \withdieresis {e} == \\"{e} \withring {u} == \\r{u} \withhungarumlaut{u} == \\H{u} \withcaron {e} == \\v{e} \withcedilla {e} == \\c{e} \withogonek {e} == \\k{e} Did I miss one? Then we can deprecate the short ones (keep them a low profile, with permission to overload). After all, I don't expect someone who needs lots of them to use these commands, so more verbose is better then. Aas I already mentioned, in bib files they are treated differently already. The low level helper is \chr, that can be used as \chr {à} \chr {á} \chr {ä} \chr {`a} \chr {'a} \chr {"a} \chr {a acute} \chr {a grave} \chr {a umlaut} \chr {aacute} \chr {agrave} \chr {aumlaut} (I can add more of the verbose, like {cyrillic a} if really needed. It means that we can declare \eacute etc also depricated (these verbose names date from \MKII, encoding neutral labels, utf handling, remapping to backend encodings etc but we don't need that and I'm not sure if anyone ever used those long names. Again, depricated, not removed (yet).) Then there is the question what to do with \AE and \ij and such ... these were used to enforce specific ligatures into a file assuming that f ont has them but nowadays that's the job of a font handler (script language control). We can keep them but assume them legacy. They normally don't belong in input. (Being Dutch I actually never used \IJ or \ij). Now, we can assume that when your languages needs characters with accents that you use a font that has them. In MKIV and LMTX one can enable a checker \enabletrackers[fonts.missing] \enabletrackers[fonts.missing=replace] \enabletrackers[fonts.missing=remove] but in LMTX it's upgraded with more clever replacements (Jano will document that + more about checking missing stuff in the wiki). So, in LMTX we have more options (maybe I'll backport that to MKIV) \checkmissingcharacters \enabletrackers[fonts.missing] \removemissingcharacters \enabletrackers[fonts.missing=remove] \replacemissingcharacters \enabletrackers[fonts.missing=replace] \handlemissingcharacters \enabletrackers[fonts.missing={decompose,replace}] the last one will inject decomposed characters into the list when font lacks the real thing. The replacements visualize similar to MKIV but adapt to the style. Hans (no upload yet) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
I think that it would be useful. I use Unicode characters extensively in my ConTeXt input, but only because I edit it in Emacs and can set up keymaps that map to the Unicode characters in a way that I can actually remember. I think that this would add an easily remembered way for people to add combining characters to their documents. Sometimes a slightly more verbose way to do something is helpful when it is more easily remembered. (Honestly, I can't remember the hex codes for any Unicode characters after you get out of the range that maps to plain ASCII I anyway uprgade this mechanism. First of all, the short commands will be equivalents to more verbose ones. \withgrave {a} == \\`{a}\withacute {a} == \\'{a}\withcircumflex {a} == \\^{a}\withtilde {a} == \\~{a}\withmacron {a} == \\={a}\withbreve {e} == \\u{e}\withdot {c} == \\.{c}\withdieresis {e} == \\"{e}\withring {u} == \\r{u}\withhungarumlaut{u} == \\H{u}\withcaron {e} == \\v{e}\withcedilla {e} == \\c{e}\withogonek {e} == \\k{e} Did I miss one? Then we can deprecate the short ones (keep them a low profile, with
Hans,
For me, at least, having these covered would be useful:
acute á
double acute ő
grave à
double grave ȍ
circumflex â
circumflex below ḙ
diaeresis ä
tilde ã
tilde below ḭ
macron ā
line below ḵ
cedilla ç
comma below ş
hook ȥ
ring above å
ring below ḁ
dot above ṁ
middle dot ŀ
dot below ṃ
breve ă
inverted breve ȃ
caron ǩ
stroke ø
Best, Richard
-----Original Message-----
From: Hans Hagen
Hi Hans, Thanks for the new composing commands. I made several tests and everything works great. I guess when you say something like
\withgrave {a} == \\`{a}
you mean \withgrave {a} == \`{a} (this is what I tested…). Regarding the characters æ and œ, the command \chr produces them correctly, that is \chr{ae} \chr{AE} \chr{oe} \chr{OE} produce æ Æ œ Œ as expected. If you think these commands are to stay, please tell me if they have to be on the wiki. Best regards: Otared
On 8 Feb 2021, at 10:53, Hans Hagen
wrote: On 2/6/2021 11:41 PM, T. Kurt Bond wrote:
I think that it would be useful. I use Unicode characters extensively in my ConTeXt input, but only because I edit it in Emacs and can set up keymaps that map to the Unicode characters in a way that I can actually remember. I think that this would add an easily remembered way for people to add combining characters to their documents. Sometimes a slightly more verbose way to do something is helpful when it is more easily remembered. (Honestly, I can't remember the hex codes for any Unicode characters after you get out of the range that maps to plain ASCII I anyway uprgade this mechanism. First of all, the short commands will be equivalents to more verbose ones.
\withgrave {a} == \\`{a} \withacute {a} == \\'{a} \withcircumflex {a} == \\^{a} \withtilde {a} == \\~{a} \withmacron {a} == \\={a} \withbreve {e} == \\u{e} \withdot {c} == \\.{c} \withdieresis {e} == \\"{e} \withring {u} == \\r{u} \withhungarumlaut{u} == \\H{u} \withcaron {e} == \\v{e} \withcedilla {e} == \\c{e} \withogonek {e} == \\k{e}
Did I miss one?
Then we can deprecate the short ones (keep them a low profile, with permission to overload). After all, I don't expect someone who needs lots of them to use these commands, so more verbose is better then. Aas I already mentioned, in bib files they are treated differently already.
The low level helper is \chr, that can be used as
\chr {à} \chr {á} \chr {ä} \chr {`a} \chr {'a} \chr {"a} \chr {a acute} \chr {a grave} \chr {a umlaut} \chr {aacute} \chr {agrave} \chr {aumlaut}
(I can add more of the verbose, like {cyrillic a} if really needed. It means that we can declare \eacute etc also depricated (these verbose names date from \MKII, encoding neutral labels, utf handling, remapping to backend encodings etc but we don't need that and I'm not sure if anyone ever used those long names. Again, depricated, not removed (yet).)
Then there is the question what to do with \AE and \ij and such ... these were used to enforce specific ligatures into a file assuming that f ont has them but nowadays that's the job of a font handler (script language control). We can keep them but assume them legacy. They normally don't belong in input. (Being Dutch I actually never used \IJ or \ij).
Now, we can assume that when your languages needs characters with accents that you use a font that has them. In MKIV and LMTX one can enable a checker
\enabletrackers[fonts.missing] \enabletrackers[fonts.missing=replace] \enabletrackers[fonts.missing=remove]
but in LMTX it's upgraded with more clever replacements (Jano will document that + more about checking missing stuff in the wiki).
So, in LMTX we have more options (maybe I'll backport that to MKIV)
\checkmissingcharacters \enabletrackers[fonts.missing] \removemissingcharacters \enabletrackers[fonts.missing=remove] \replacemissingcharacters \enabletrackers[fonts.missing=replace] \handlemissingcharacters \enabletrackers[fonts.missing={decompose,replace}]
the last one will inject decomposed characters into the list when font lacks the real thing. The replacements visualize similar to MKIV but adapt to the style.
Hans
(no upload yet)
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
On 2/9/2021 11:03 AM, Otared Kavian wrote:
Hi Hans,
Thanks for the new composing commands. I made several tests and everything works great. I guess when you say something like
\withgrave {a} == \\`{a}
you mean
\withgrave {a} == \`{a}
(this is what I tested…). Regarding the characters æ and œ, the command \chr produces them correctly, that is
\chr{ae} \chr{AE} \chr{oe} \chr{OE}
produce
æ Æ œ Œ
as expected. If you think these commands are to stay, please tell me if they have to be on the wiki. I uploaded a new version. You can run
s-characters-combinations.mkxl to see what's there. I added enough to be able to meet Richard's demands. Some in that list are a bit tricky, like hook ȥ stroke ø because they are not really accents. The characters with hooks are not officially build from two characters. There is no composition info so I added that myself but in such a way that it doesn't interfere with regular input. There might be more exceptions that we need to deal with, but I just wait till someone bring it up (read: it's up to users to come with additional specs). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Is it possible to add \withinvertedbrevebelow to the list?
On Mon, Feb 8, 2021 at 6:53 AM Hans Hagen
On 2/6/2021 11:41 PM, T. Kurt Bond wrote:
I think that it would be useful. I use Unicode characters extensively in my ConTeXt input, but only because I edit it in Emacs and can set up keymaps that map to the Unicode characters in a way that I can actually remember. I think that this would add an easily remembered way for people to add combining characters to their documents. Sometimes a slightly more verbose way to do something is helpful when it is more easily remembered. (Honestly, I can't remember the hex codes for any Unicode characters after you get out of the range that maps to plain ASCII I anyway uprgade this mechanism. First of all, the short commands will be equivalents to more verbose ones.
\withgrave {a} == \\`{a} \withacute {a} == \\'{a} \withcircumflex {a} == \\^{a} \withtilde {a} == \\~{a} \withmacron {a} == \\={a} \withbreve {e} == \\u{e} \withdot {c} == \\.{c} \withdieresis {e} == \\"{e} \withring {u} == \\r{u} \withhungarumlaut{u} == \\H{u} \withcaron {e} == \\v{e} \withcedilla {e} == \\c{e} \withogonek {e} == \\k{e}
Did I miss one?
Then we can deprecate the short ones (keep them a low profile, with permission to overload). After all, I don't expect someone who needs lots of them to use these commands, so more verbose is better then. Aas I already mentioned, in bib files they are treated differently already.
The low level helper is \chr, that can be used as
\chr {à} \chr {á} \chr {ä} \chr {`a} \chr {'a} \chr {"a} \chr {a acute} \chr {a grave} \chr {a umlaut} \chr {aacute} \chr {agrave} \chr {aumlaut}
(I can add more of the verbose, like {cyrillic a} if really needed. It means that we can declare \eacute etc also depricated (these verbose names date from \MKII, encoding neutral labels, utf handling, remapping to backend encodings etc but we don't need that and I'm not sure if anyone ever used those long names. Again, depricated, not removed (yet).)
Then there is the question what to do with \AE and \ij and such ... these were used to enforce specific ligatures into a file assuming that f ont has them but nowadays that's the job of a font handler (script language control). We can keep them but assume them legacy. They normally don't belong in input. (Being Dutch I actually never used \IJ or \ij).
Now, we can assume that when your languages needs characters with accents that you use a font that has them. In MKIV and LMTX one can enable a checker
\enabletrackers[fonts.missing] \enabletrackers[fonts.missing=replace] \enabletrackers[fonts.missing=remove]
but in LMTX it's upgraded with more clever replacements (Jano will document that + more about checking missing stuff in the wiki).
So, in LMTX we have more options (maybe I'll backport that to MKIV)
\checkmissingcharacters \enabletrackers[fonts.missing] \removemissingcharacters \enabletrackers[fonts.missing=remove] \replacemissingcharacters \enabletrackers[fonts.missing=replace] \handlemissingcharacters \enabletrackers[fonts.missing={decompose,replace}]
the last one will inject decomposed characters into the list when font lacks the real thing. The replacements visualize similar to MKIV but adapt to the style.
Hans
(no upload yet)
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net
___________________________________________________________________________________
-- Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o corpo nem seus membros, por ser descanso da alma, primavera do coração, distração do aflito, entretenimento do solitário, e viático do viajante. Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
Hi Thomas,
On 2/5/21 5:38 PM, Hans Hagen wrote:
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
No hurt feelings, but I know that in my bib files, there are a couple of old entries that still have these weird composed characters. So I'm fine with upgrading, but it would be nice if this could fail gracefully, with a nice and informative error message...
Okay, here is a secret. When your bib fils is read, those magic accent placement commands are not used at all: \starttext \startbuffer[bib] @article{test, title = {\"Articl\`e \O n\k{e}}, author = {Th\^omas}, year = {2001}, } \stopbuffer \usebtxdataset[bib.buffer] \ctxlua{inspect(publications.datasets.default.luadata.test)} \placebtxrendering[method=dataset,pagestate=start] \stoptext They have magically disappeared. Thanks to the fact hat Alan and I spent quite a bit of time on brewing the magic potion when we redid the bib stuff. So, although we will keep the shortcuts you'd probably never noticed them being gone. Now the question is: what can we expect in old bib files that we {\em don't} handle. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hi Hans, I agree that nowadays most composed characters can be input directly from the keyboard (at least judging from my experience with MacOS), but nevertheless it may happen that one uses an old file used as an input source file (for instance extracting a math exercise from a database) in which there are old fashion composed characters. It would be hard to go back and replace those characters in each file. Besides, with the traditional plain TeX composed characters something like \"c would give a correct result (the character c with a kind of umlaut on top of it), but this cannot be input from the keyboard (and maybe it does not exist at all in UTF…). (Actually I just tried \"c with LMTX and mkiv and it does not give what is expected from TeX… I am sure it did work some years ago :-) ) If, as you suggest, such composed characters maybe used at the cost of saying at the beginning of one's file: \usemodule[oldschool] then there is no real harm in removing composing commands, although I am not an enthusiastic supporter of removing them. Best regards: Otared K.
On 5 Feb 2021, at 17:38, Hans Hagen
wrote: Hi,
As I'm going over the commands in lmtx, I wonder if we should keep
\c \d \k \r \u \v
\" \' \` \^
etc ... the ones that make 'composed characters'. I think that anyone who needs them uses utf . They can be in (say) m-oldschool.mkxl or so.
Objections? Hurt feelings? Sentiments?
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
On 2/5/2021 11:21 PM, Otared Kavian wrote:
Hi Hans,
I agree that nowadays most composed characters can be input directly from the keyboard (at least judging from my experience with MacOS), but nevertheless it may happen that one uses an old file used as an input source file (for instance extracting a math exercise from a database) in which there are old fashion composed characters. It would be hard to go back and replace those characters in each file. Besides, with the traditional plain TeX composed characters something like
\"c
indeed, not in unicode ... so unlikely dealt with in fonts (i could of course support it as in mkii but in over a decade no one complained so ...)
would give a correct result (the character c with a kind of umlaut on top of it), but this cannot be input from the keyboard (and maybe it does not exist at all in UTF…). (Actually I just tried \"c with LMTX and mkiv and it does not give what is expected from TeX… I am sure it did work some years ago :-) )
never call a diaeresis an umlaut and vise versa ... some texies can get very emotional about that ... and you don't want to know how long winding and boring discussions about the distance between the base character and the accent can/has/might be (i remember some discussion about the umlaut being lower positioned given its historis origin, it being tiny letters) ... often discussions focused on computer modern because that was kind of under control (we had plenty of variants for each language, which is interesting because they actually could be in one font, but that (a new encoding as part of texgyre) happened just before we all went unicode and before the european union became larger
If, as you suggest, such composed characters maybe used at the cost of saying at the beginning of one's file: \usemodule[oldschool] then there is no real harm in removing composing commands, although I am not an enthusiastic supporter of removing them. Well, that why I ask.
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On Fri, Feb 5, 2021 at 11:47 PM Hans Hagen
never call a diaeresis an umlaut and vise versa ...
hm latin is vice versa (or versa vice) italian is viceversa english is vice versa https://forvo.com/word/vice_versa/ I think that the latin one is the classic (or "restituta") pronunciation, the ecclesiastic should be more or less like the italian . -- luigi
participants (8)
-
Hans Hagen
-
Jairo A. del Rio
-
luigi scarso
-
Marcus Vinicius Mesquita
-
Otared Kavian
-
Richard Mahoney
-
T. Kurt Bond
-
Thomas A. Schmitz