···
On 09/29/2012 02:35 PM, Hans Hagen wrote:
On 29-9-2012 01:41, Simo Ojala wrote:
Hans Hagen
On 09/28/2012 11:46 AM, Hans Hagen wrote:
On 27-9-2012 21:27, Simo Ojala wrote:
This is a problem originally posted in TeX/StackExchange. However, since I have not had any luck in finding a solution I post it here too. I am confident that somebody here should know the answer.
http://tex.stackexchange.com/questions/73970/problem-with-context-mkiv-hebre...
"Since I last played with the latest ConTeXt MkIV, there has been introduced this new feature. It now seems to combine Hebrew characters automatically when possible to ligatures. So for example. If I have a word with following two characters:
U+05D5 (HEBREW LETTER VAV) U+05BC (HEBREW POINT DAGESH OR MAPIQ)
ConTeXt will combine these to:
U+FB35 (HEBREW LETTER VAV WITH DAGESH)
However, I would need to disable this feature for a number of reasons. For example, this breaks my little database query, because the query key is changed before(?) macro gets it.
So if somebody would know how to turn this off and maybe also that what has changed."
It depends on the font ... normally you can disable this by *not* using the mark and mkmk features
Hans
Ok, I have now tried turning off all kinds of features without luck. So, I tried putting together minimal test case. I suspect that there should be done something more than just turn off some font features. However, my ConTeXt skills are very limited so I can be wrong.
The goal is that the word passed from ConTeXt file remains as it is written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. This is what already happens when the word is in the lua file.
Simo
PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:40". It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is in the Adam Reviczky's PPA.
%% testcase.tex
\definefontfeature[hebrew][arabic][script=hebr] \definefont[dejavusans][name:dejavusans*hebrew at 26pt] \setupdirections[bidi=global]
\starttext \dejavusans
\def\Macro#1{\directlua{ dofile(resolvers.findfile("testcase.lua")) userdata.testfunction("#1") }}
\Macro{סוּס}
\blank[1cm]however, we can still color these independently\blank[0.5cm]
\color[red]{ס}\color[green]{ו}\color[blue]{ּ}\color[yellow]{ס}
\stoptext
-- testcase.lua
userdata = userdata or {}
function userdata.testfunction(word)
tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]")
for i = 1, unicode.utf8.len(word) do tex.sprint("U+" .. string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. unicode.utf8.sub(word,i,i) .. "\\par" ) end
tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]")
word = "סוּס"
for i = 1, unicode.utf8.len(word) do tex.sprint("U+" .. string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. unicode.utf8.sub(word,i,i) .. "\\par" ) end end
I see three characters next to each other so what exactly is the problem?
(BTW, take a look at goodies-002.tex in the test suite ... you can define colored glyphs as a feature)
Hans
Sorry for being unclear, I try to clarify. The problem is:
1. I have tex file with which calls a macro with argument that has characters U+5d5 and U+5bc. 2. Macro passes argument further to lua code. When it gets there characters have turned to U+fb35.
Hi, I don’t have clue about hebrew but isn’t this a correct normalization[0], not a ligature? If so, the behavior of Luatex is perfectly fine. Lua otoh treats the string as a sequence of bytes, which is just how it treats strings everywhere. [0] http://www.unicode.org/charts/normalization/chart_Hebrew.html Regards Philipp
3. When the lua code then compares the U+fb35 with xml file that has the original forms U+5d5 and U+5bc it of course fails.
So, the problem is that there is this phase 2 that has not always happened. If possible I would like to turn it off somehow. Of course I could try to write some workaround code to countermeasure this substitution or what it should be called. But that could be complicated and lead to more problems.
Simo
PS: I attached my result of the test case in case this is problem with my setup. Compiled with ConTeXt MkIV 2012.09.25 21:44.
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments