This is a copy of a message I sent to the Latex3 list some time before and I got no answer (something not rare in that list). So I post it here, where it also belongs. There have been in the past many discussions about how to create a really multilingual program. One of the points was to allow typeseting control sequences (cs) in Russian (say), but we cannot simply make the \catcode equal to 11 because then the character would be transparently passed to the dvi (ie., the font), which only works if input and output encondings match. Can we simply add in pdftex a \cscode? Such a \cscode would be equal to 1 or 0, accordingly to whether the characert is allowed in the name of a cs or not (more precisely, a character with \cscode 0 would make the previous one the last of the cs, or if it is the first one it would be the only one in the name of the cs). In Latex, the setting of the cscodes would be done when selecting the input encoding. In order not to break tons of existing code, we should still allow catcode 11 characters to form control names. Thus, whenever now we have (approximately, I have not yet read the sources) if(catcode=11) we change it with if(catcode=11 or cscode=1) %If cscode=1 then, even if catcode=active, the character works as a letter when building the name of a control word We initialy set all cscodes equal to 0, so for the existing code nothing changes, while form now on we will be able to type control sequences with our alphabet (cyrilic, greek, ...), even if those characters are active ones. -- Javier A.
Dear Javier, copy all -- Because it is difficult for me to think in Russian (or even in Greek), would it be possible for you to give an example of your desired behaviour using (say) French. I /imagine/ that what you are saying is that if we take é (<e-acute>) as an example of a character that could not normally form a part of a control sequence, you would like to be able to do the following : \cscode `\é = 1 \catcode `\é = \active \def é{\'e} \def \née {born} and that this solution is preferred to \catcode `\é = \catcode `\e \def \née {born} because the latter does not allow the \catcode `\é = \active \def é{\'e} functionality that \cscode does. Is this correct ? ** Phil. -------- because the latter has implications Javier Múgica de Rivera wrote:
This is a copy of a message I sent to the Latex3 list some time before and I got no answer (something not rare in that list). So I post it here, where it also belongs.
There have been in the past many discussions about how to create a really multilingual program. One of the points was to allow typeseting control sequences (cs) in Russian (say), but we cannot simply make the \catcode equal to 11 because then the character would be transparently passed to the dvi (ie., the font), which only works if input and output encondings match. Can we simply add in pdftex a \cscode? Such a \cscode would be equal to 1 or 0, accordingly to whether the characert is allowed in the name of a cs or not (more precisely, a character with \cscode 0 would make the previous one the last of the cs, or if it is the first one it would be the only one in the name of the cs). In Latex, the setting of the cscodes would be done when selecting the input encoding.
In order not to break tons of existing code, we should still allow catcode 11 characters to form control names. Thus, whenever now we have (approximately, I have not yet read the sources)
if(catcode=11)
we change it with
if(catcode=11 or cscode=1) %If cscode=1 then, even if catcode=active, the character works as a letter when building the name of a control word
We initialy set all cscodes equal to 0, so for the existing code nothing changes, while form now on we will be able to type control sequences with our alphabet (cyrilic, greek, ...), even if those characters are active ones.
-- Javier A.
_______________________________________________ ntg-pdftex mailing list ntg-pdftex@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-pdftex
Disgusted of Tunbridge Wells escribió:
Dear Javier, copy all --
Because it is difficult for me to think in Russian (or even in Greek), would it be possible for you to give an example of your desired behaviour using (say) French. I /imagine/ that what you are saying is that if we take é (<e-acute>) as an example of a character that could not normally form a part of a control sequence, you would like to be able to do the following :
\cscode `\é = 1 \catcode `\é = \active \def é{\'e} \def \née {born}
and that this solution is preferred to
\catcode `\é = \catcode `\e \def \née {born}
because the latter does not allow the
\catcode `\é = \active \def é{\'e}
functionality that \cscode does. Is this correct ?
** Phil. -------- because the latter has implications
Yes, I mean exactly that. I chose the example of Russian because in a language that uses the Latin script, as is the case of French or Spanish, it really isn't that annoying not to be able to use accented leters or some particular letter of the alphabet (\~n, \c c, ...), but if you are using another script it is very likely that you need all the letters of your alphabet to be active, and so you are forced to use the latin alphabet to build the names of control sequences. You said that it is difficult for you to think in Russian. Well, neither can I, but people using TeX in those countries **must** think in another alphabet, and it would be really an improvement if they were able to write control sequences using their natural script. -- Javier A.
Hi Javier, Javier Múgica de Rivera wrote:
Yes, I mean exactly that. I chose the example of Russian because in a language that uses the Latin script, as is the case of French or Spanish, it really isn't that annoying not to be able to use accented leters or some particular letter of the alphabet (\~n, \c c, ...), but if you are using another script it is very likely that you need all the letters of your alphabet to be active, and so you are forced to use the
Let me start by saying this: If your font is set up correctly for your language, not all characters really 'have to' be active. Inputenc.sty makes them active so that it can support many encodings, but there is no real need for that. If (for instance) é is in slot 233 of your input encoding, and you use a font encoding that has it in slot 233 as well, then it can simply be a 'letter', and you could happily use it in a control sequence. In the future, even with a different font encoding, the need for all these characters to be active will disappear completely because future versions of pdfTeX will have a separation between characters and font glyphs, with a tunable remapping stage inbetween. Adding a \cscode primitive may be interesting nonetheless, but like Phil said: there will be implications. The next release (1.40) of pdftex is in feature freeze so there is no chance of adding it in there, but I'll do some experiments in LuaTeX to see just how many implications there are and if they can be overcome without breaking compatibility. Cheers, Taco
Taco Hoekwater escribió:
Hi Javier,
Let me start by saying this: If your font is set up correctly for your language, not all characters really 'have to' be active. Inputenc.sty makes them active so that it can support many encodings, but there is no real need for that.
If (for instance) é is in slot 233 of your input encoding, and you use a font encoding that has it in slot 233 as well, then it can simply be a 'letter', and you could happily use it in a control sequence.
Yes, I know that, but it is very likeky, specialy in some non-latin scripts, that your input encoding does not match the font encoding. You may argue that you can remap fonts, but be realistic, how many people actually now how to do that (and besides, it is still much simpler if you let the inputenc/fontenc packages do the thing).
In the future, even with a different font encoding, the need for all these characters to be active will disappear completely because future versions of pdfTeX will have a separation between characters and font glyphs, with a tunable remapping stage inbetween. That is really good news.
Cheers, Javier A.
participants (3)
-
Disgusted of Tunbridge Wells
-
Javier Múgica de Rivera
-
Taco Hoekwater