Hello Idris,
I didn't see any reply to this e-mail you sent two weeks ago, so I
wanted to give it a try:
In luatex can I make a definition such that such that the string
U004C U0303 (l ̃)
is always treated as l with tilde above, taking into account italics and
without using \~l (which does not work in, eg, footnote)?
What you want here is to support the Unicode combining characters,
which isn't straightforward in TeX because according to the Standard,
they come after the base letter they modify, while TeX's accent commands
are, of course, typed before. So you can't simply make the combining
characters active and equivalent to the appropriate accent macros.
In traditional TeX, it would have been tempting to make the base letter
active instead, but this has a lot of drawbacks, and LuaTeX offers many
other possibilities. Here I've used a set of macros that Taco had
written a couple of months ago in response to a question by Thomas
Schmitz (see http://www.ntg.nl/pipermail/ntg-context/2007/027095.html).
The attached file implements the transformation of the sequence in "\buildtextaccent\texttilde l",
which I hope gives the expected result in every circumstance. I've done
it only for the small letter, but of course it's easy to adapt to add
the capital letter as well.
Finally, I wish to clarify a small misunderstanding: you quoted the
two lines below:
LATIN CAPITAL LETTER L WITH TILDE;004C 0303
LATIN SMALL LETTER L WITH TILDE;006C 0303
with the comment "The proposal is still under consideration for
Lithuanian and not yet in Unicode". Actually it is already encoded in
Unicode; that is, all the characters you need are present with the
appropriate semantics, and you can accurately represent a small l with
tilde in Unicode; only, you have to use two characters (U+006C followed
by U+0303). The only thing that will be added to Unicode in that
respect is the *name* of those strings (I guess you took those two lines
from the data files for Unicode version 5.1.0, in beta stage). The
corresponding characters, though, will not be added to Unicode,
according to a decision which has been made several years ago (I could
trace it back to a discussion at the Unicode Technical Committee in
October 1999, but I don't know the details). The idea is that it can
already be represented as a sequence of characters, and the Unicode
Consortium does not wish to make the set of alphabetic characters
explode with diacritics.
In spite of this, Unicode still wishes to acknowledge that some
unencoded accented letters are important in some languages, and provides
names for the character sequences representing them, like it does for
all the encoded characters. The relevant document that explains this is
Unicode Standard Annex #34 (http://www.unicode.org/reports/tr34/).
Arthur