Hi, Taco: Last week I stayed in my grandma's home (located in a beautiful countryside) which has limited internet access. So I planned to study the luatex source code during the stay. I read a small subsect of the LuaTeX source and most parts of MetaPost source code and have a few newbie questions: I will refer to the mp.w as an example, since it is finished and I am more familiar with the C programming language. - wouldn't it be great if we remove the packed data dependency? Knuth used packed data since memory space was very valuable during the years when he wrote the TeX and MetaFont program. By doing so, he can split a memory word (the space needed for a 32bit integer) into two, three or four so that he can save more spaces. In pdfTeX and MetaPost, almost all the important data structures (like all kinds of nodes, boxes, lists) are build on top of the packed data structure. For example, In TeX's char_node presentation, instead of using one word for font and one word for character, he can use one word to do so by defining two web macros font==type and character==subtype. However, Memory consumption is not that important these years and in LuaTeX other representations uses much more memory (like parsing a complicated opentype font and dumping it into a lua table using the fontforge library). So wouldn't it be bettter if we remove the packed data? It will make the code more readable at the cost of consuming a few more memory (the TeX web side is very memory efficient, so there won't be big memory footprint even if we double the memory consumption) since we can get rid of the messy web or C macros. For example, in Part 21, mp.w, we define more than a dozen macros to get all the variables of a edge which A points to its start. Such code may confuse modern programmers who learn to program in C after the mid-1990s. They prefer to do something like this: typedef fill_node struct { path *path_p; pen *pen_p; ..... int miterlim_var; }; if we want to create a new fill node, instead of creating its node, and setting its path_p to a given pointer p using mp_path_p(t)=p; it will be more understandable if we can do this: node * mp_new_fill_node (pointer, p){ fill_node *fn = malloc (sizeof (fill_node)); /* a fill_node*/ fn->path_p = p; fn->pen_p=NULL; fn->red_val=0 .... return fn } -Why can't we use IEEE floating number specification? TeX/MetaPost have it's own number presentation builtin. But this is a very complicated and strange floating number system which represent each number as a integer. I think IEEE floating number specification is good enough for implementing TeX? Today almost every operating system support float/double, and the precision is great. If we can do this, Part 7 of the mp.w and luatex.web can be totally removed, and we can also clean the conversion macros in the rest of the macros. If we remove the dependency of part 7 and part 9, maybe we can also make our code more portable on different machines. - Why should we do memory management ourselves? I can see that in mp.w, we maintain a node pool (a fixed continuous space in physical memory). When we want to allocate a node, we should call mp_get_node, which does several things like finding available place for memory allocation, reporting error if the memory is exceed then it tries to put everything with different structures into the pool. Wouldn't it be great if we just ask the operating system's C library to handle these tasks (like using malloc and free as the example code of mp_new_fill_node shows). Modern operating system's library is more efficient, and it will also make the luatex code looks better. - From the part of the code I read, if I understand correctly, the only reason of incorporating fontforge is to get the font Metrics data? If so, I think FreeType2 is sufficient to accomplish the task? [I think freetype2 is much smaller and efficient, and fontforge use freetype2] Yue Wang
Yue Wang wrote:
- From the part of the code I read, if I understand correctly, the only reason of incorporating fontforge is to get the font Metrics data? If so, I think FreeType2 is sufficient to accomplish the task? [I think freetype2 is much smaller and efficient, and fontforge use freetype2]
at some time in the future we also want access to the shapes etc because we want to experiment with runtime glyph adaption and such also, by taking the internal ff representation as starting point we have the advantage that we can use ff's visual (debugging) capabilities for analysing for instance font features (eventually the amount of ff used will be a bit smaller because we don't need the write related code) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Yue Wang has some good points. I second the motion! Hans van der Meer On 28 jan 2009, at 07:57, Yue Wang wrote:
Hi, Taco:
Last week I stayed in my grandma's home (located in a beautiful countryside) which has limited internet access. So I planned to study the luatex source code during the stay. I read a small subsect of the LuaTeX source and most parts of MetaPost source code and have a few newbie questions:
I will refer to the mp.w as an example, since it is finished and I am more familiar with the C programming language.
- wouldn't it be great if we remove the packed data dependency? Knuth used packed data since memory space was very valuable during the years when he wrote the TeX and MetaFont program. By doing so, he can split a memory word (the space needed for a 32bit integer) into two, three or four so that he can save more spaces. In pdfTeX and MetaPost, almost all the important data structures (like all kinds of nodes, boxes, lists) are build on top of the packed data structure. For example, In TeX's char_node presentation, instead of using one word for font and one word for character, he can use one word to do so by defining two web macros font==type and character==subtype. However, Memory consumption is not that important these years and in LuaTeX other representations uses much more memory (like parsing a complicated opentype font and dumping it into a lua table using the fontforge library). So wouldn't it be bettter if we remove the packed data? It will make the code more readable at the cost of consuming a few more memory (the TeX web side is very memory efficient, so there won't be big memory footprint even if we double the memory consumption) since we can get rid of the messy web or C macros.
For example, in Part 21, mp.w, we define more than a dozen macros to get all the variables of a edge which A points to its start. Such code may confuse modern programmers who learn to program in C after the mid-1990s. They prefer to do something like this: typedef fill_node struct { path *path_p; pen *pen_p; ..... int miterlim_var; }; if we want to create a new fill node, instead of creating its node, and setting its path_p to a given pointer p using mp_path_p(t)=p; it will be more understandable if we can do this: node * mp_new_fill_node (pointer, p){ fill_node *fn = malloc (sizeof (fill_node)); /* a fill_node*/ fn->path_p = p; fn->pen_p=NULL; fn->red_val=0 .... return fn }
-Why can't we use IEEE floating number specification? TeX/MetaPost have it's own number presentation builtin. But this is a very complicated and strange floating number system which represent each number as a integer. I think IEEE floating number specification is good enough for implementing TeX? Today almost every operating system support float/double, and the precision is great. If we can do this, Part 7 of the mp.w and luatex.web can be totally removed, and we can also clean the conversion macros in the rest of the macros. If we remove the dependency of part 7 and part 9, maybe we can also make our code more portable on different machines.
- Why should we do memory management ourselves? I can see that in mp.w, we maintain a node pool (a fixed continuous space in physical memory). When we want to allocate a node, we should call mp_get_node, which does several things like finding available place for memory allocation, reporting error if the memory is exceed then it tries to put everything with different structures into the pool. Wouldn't it be great if we just ask the operating system's C library to handle these tasks (like using malloc and free as the example code of mp_new_fill_node shows). Modern operating system's library is more efficient, and it will also make the luatex code looks better.
- From the part of the code I read, if I understand correctly, the only reason of incorporating fontforge is to get the font Metrics data? If so, I think FreeType2 is sufficient to accomplish the task? [I think freetype2 is much smaller and efficient, and fontforge use freetype2]
Yue Wang _______________________________________________ dev-luatex mailing list dev-luatex@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-luatex
Yue Wang wrote:
Hi, Taco:
Last week I stayed in my grandma's home (located in a beautiful countryside) which has limited internet access. So I planned to study the luatex source code during the stay.
Shouldn't you have been out enjoying that same countryside then? ;)
- wouldn't it be great if we remove the packed data dependency?
Yes, it would be better. It entails a lot of work though, even for relatively small program like mplib, and there are not only advantages.
since we can get rid of the messy web or C macros.
We would be able to get rid of the macros, but we will have to introduce lots of typecasts or pointer indirections depending on whether the node structure will be a heterogenous union or a listhead + data. I have gone through that process for the exported image backend in mplib (which uses unions, see e.g. psout.w), and it was a fair amount of work even for that small chunk of code. I am not saying that it shouldn't be done, just that it is not something that can be undertaken lightly. I already intend to make metapost's knots and knot lists use truly dynamic memory (a relatively minor task) but I have to find a quiet time. In luatex the problem is more complicated because pascal&web2c does not like pointers much, so nothing is planned there until after the switch from pascal to C(web).
-Why can't we use IEEE floating number specification?
Please do an internet search for "megapost metapost", and you will find some posts and presentations on the subject (most written by me). Short summary: future versions of MPlib will incorporate either the MPFR library ( http://www.mpfr.org/ ) or decNumber ( http://www.alphaworks.ibm.com/tech/decnumber ). I have yet to decide which one, both have their advantages.
TeX/MetaPost have it's own number presentation builtin. But this is a very complicated and strange floating number system which represent each number as a integer. I think IEEE floating number specification is good enough for implementing TeX?
Not really, because there are portability issues wrt rounding. the normal C "double" data type is not acceptable without an additional library to make it behave reliably. And for MP, double is not really precise enough anyway, so it is better jump to something more serious.
remove the dependency of part 7 and part 9, maybe we can also make our code more portable on different machines.
TeX's do-it-yourself integer calculus is the most portable you can get.
- Why should we do memory management ourselves?
It is still there mostly for the same reason as mentioned above for the packed memory: it is a lot of work to change it.
Wouldn't it be great if we just ask the operating system's C library to handle these tasks (like using malloc and free as the example code of mp_new_fill_node shows). Modern operating system's library is more efficient, and it will also make the luatex code looks better.
This is true for mp.w, but I am not so sure that system malloc() will perform better that texnodes.c (which essentially uses 10 dedicated avail lists). Anyway, see the remarks above.
- From the part of the code I read, if I understand correctly, the only reason of incorporating fontforge is to get the font Metrics data?
Not only that. Stuff that needs doing: * parsing of Type1 & Type0 (CID composite) fonts * parsing of sfnt containers (ttc and dfont) * parsing of TTF and OTF fonts (including AAT) * bounding box calculations for PostScript-based glyphs (because this information is not in the font itself) * parsing and processing of GPOS & GSUB tables * font reencoding * converting to all of this to a lua table To be honest, I have no idea how many of those things can be done by freetype2, but I highly doubt its OTF support will be sufficient.
If so, I think FreeType2 is sufficient to accomplish the task? [I think freetype2 is much smaller and efficient, and fontforge use freetype2]
Hi, Taco:
Thanks very much for the explanation, and I agree with you that
removing packed data and rewrite the memory allocation parts will take
huge amount of work as it is the structures that most code built upon,
and many TeX implementations do not touch that (including Common TeX
which is a hand-conversion of Knuth's TeX, and your original cxtex
which is a hand-conversion of pdfTeX). As far as I know, only NTS and
its successor, ExTeX have modernized TeX's data structures. But what's
the benefit of converting TeX and MP's web source code into c or cweb
if the old-fashioned data structures and algorithms remain unchanged?
And Getting rid of the web2c step and making debugging/extension much
easier are the ultimate goals for the C version? I doubt not.
I think the main reason for TeX's development is not so active as
other famous software projects is that TeX's program is not easy to
read, and only those experienced programmer with highest level of
programming ability (like Taco, of course) know how to write its code.
During LuaTeX's development I found several bugs in LuaTeX but I don't
know how to debug the program and send you the patches. I felt
depressed since I could not understand many parts of the code although
TeX's source code have the best quality documentation. Other projects
like Lua may not have good documentation and source code comments, but
their code is very clear as parser/lexer/various
libs/compiler/interpreter are separated so I can start debugging or
writing extensions after skimming two or three of the source code
files. But in the TeX world things are quite another, All parts in the
program are depended with each other, and one cannot understand some
parts if he had not read the previous parts.
Moreover, as I have pointed out, TeX's data structure and memory
management are not friendly to the newcomer. I start reading TeX: The
Program three years ago (As a university freshman at that time, I only
know how to program in Pascal which is the official language for
NOI[National Olympiad in Informatics], and know a little bit about
data structure and computer algorithms like searching, hashing, and
simple graph algorithms) since some guys recommended it to me saying
that the pascal program is written by Don Knuth. However, after
reading several Parts, I get confused, not because the strange web
notation, but the complicated macros. So I stopped learning the code
at the first try. Then I learned the C programming language when I was
studying in the University and it become my major language, and today
I almost forget about the Pascal language. I was excited when I heard
about the road map of the LuaTeX projects since it say it will
eventually convert all the pascal web code into C in 2010 (2007
version, Taco changed the road map several times during the years. As
I remember right, the first stable version would be released in 2007,
then tabled in 2008, and now postponed to 2009) because I think many
old parts can be replaced and make the program more readable to the
user. However, after reading the cweb version of MetaPost, I still
find understanding the code is somewhat difficult, not to say
debugging/extending it (of course, better than the first try three
years earlier).
So I think the irreplaceable significance of changing the original
TeX's WEB version into C version is to enable TeX as a successful Open
Source projects that more people can join and study its source code,
learn how to design a typography software from scratch, then extend
related algorithms. So I think the huge changing work is worthy (maybe
we can call it metapost 2.0). In fact I don't think it will take too
much amount of work as the rewriting can on the go whenever a part is
move out of the web file. The Java Implementation of TeX, NTS, takes
about (2001-1998=) three years to release a beta which have all the
structures changed. (Of course, It is interesting that all
implementation of TeX except pdfTeX cannot become the mainstream TeX.
Are compatibility the most important issue?).
Just my two cents, Best wishes to you.
Yue Wang
On Wed, Jan 28, 2009 at 5:22 PM, Taco Hoekwater
Yue Wang wrote:
Hi, Taco:
Last week I stayed in my grandma's home (located in a beautiful countryside) which has limited internet access. So I planned to study the luatex source code during the stay.
Shouldn't you have been out enjoying that same countryside then? ;)
- wouldn't it be great if we remove the packed data dependency?
Yes, it would be better. It entails a lot of work though, even for relatively small program like mplib, and there are not only advantages.
since we can get rid of the messy web or C macros.
We would be able to get rid of the macros, but we will have to introduce lots of typecasts or pointer indirections depending on whether the node structure will be a heterogenous union or a listhead + data. I have gone through that process for the exported image backend in mplib (which uses unions, see e.g. psout.w), and it was a fair amount of work even for that small chunk of code.
I am not saying that it shouldn't be done, just that it is not something that can be undertaken lightly. I already intend to make metapost's knots and knot lists use truly dynamic memory (a relatively minor task) but I have to find a quiet time.
In luatex the problem is more complicated because pascal&web2c does not like pointers much, so nothing is planned there until after the switch from pascal to C(web).
-Why can't we use IEEE floating number specification?
Please do an internet search for "megapost metapost", and you will find some posts and presentations on the subject (most written by me).
Short summary: future versions of MPlib will incorporate either the MPFR library ( http://www.mpfr.org/ ) or decNumber ( http://www.alphaworks.ibm.com/tech/decnumber ).
I have yet to decide which one, both have their advantages.
TeX/MetaPost have it's own number presentation builtin. But this is a very complicated and strange floating number system which represent each number as a integer. I think IEEE floating number specification is good enough for implementing TeX?
Not really, because there are portability issues wrt rounding. the normal C "double" data type is not acceptable without an additional library to make it behave reliably. And for MP, double is not really precise enough anyway, so it is better jump to something more serious.
remove the dependency of part 7 and part 9, maybe we can also make our code more portable on different machines.
TeX's do-it-yourself integer calculus is the most portable you can get.
- Why should we do memory management ourselves?
It is still there mostly for the same reason as mentioned above for the packed memory: it is a lot of work to change it.
Wouldn't it be great if we just ask the operating system's C library to handle these tasks (like using malloc and free as the example code of mp_new_fill_node shows). Modern operating system's library is more efficient, and it will also make the luatex code looks better.
This is true for mp.w, but I am not so sure that system malloc() will perform better that texnodes.c (which essentially uses 10 dedicated avail lists). Anyway, see the remarks above.
- From the part of the code I read, if I understand correctly, the only reason of incorporating fontforge is to get the font Metrics data?
Not only that. Stuff that needs doing:
* parsing of Type1 & Type0 (CID composite) fonts * parsing of sfnt containers (ttc and dfont) * parsing of TTF and OTF fonts (including AAT) * bounding box calculations for PostScript-based glyphs (because this information is not in the font itself) * parsing and processing of GPOS & GSUB tables * font reencoding * converting to all of this to a lua table
To be honest, I have no idea how many of those things can be done by freetype2, but I highly doubt its OTF support will be sufficient.
If so, I think FreeType2 is sufficient to accomplish the task? [I think freetype2 is much smaller and efficient, and fontforge use freetype2]
Yue Wang wrote:
And Getting rid of the web2c step and making debugging/extension much easier are the ultimate goals for the C version? I doubt not.
we need to keep a working engine so the luatex project has deliberately chosen a stepwise approach (btw, although nts resulted in a working version of tex, in practice it was way to slow to be useful; a rewrite of tex resulting in a variant many times slower that currently is not acceotable)
files. But in the TeX world things are quite another, All parts in the program are depended with each other, and one cannot understand some parts if he had not read the previous parts.
an other issue is that patching whatever bit of tex coude should be done very careful; this has been proven by adding a backend ... rather strict control over extensions and changes is needed in order to keep tex's reputation of stability up; a one line change could result in for instance a rounding issue (i mention it because we ran into it some time ago) and can have rather drastic consequences; you don't want that to happen with a machinery that has to reproduce a document at the pixel level a patch that might work on ones machine might eventually result in many problems all over the world if only because most users don't update frequently and depend on formal distributions (btw the same is true for fonts and other resources ... small changes can have huge consequences)
Moreover, as I have pointed out, TeX's data structure and memory management are not friendly to the newcomer. I start reading TeX: The
well, lucky us that 99.9% of the users works with tex at the macro level -)
eventually convert all the pascal web code into C in 2010 (2007 version, Taco changed the road map several times during the years. As I remember right, the first stable version would be released in 2007, then tabled in 2008, and now postponed to 2009) because I think many
well, we will probably change the roadmap a few more times; the 2007 and 2008 versions are rather stable, depending on what one does; keep in mind that we aim at pdftex compatibility which means that regular stuff (not using lua at all) should just work later in 2009 we will have another formal release, which opens up a bit more (and the reason for openin gup via lua is that users will use lua for extensions and not so much start patching the core engine written in pascal or c)
related algorithms. So I think the huge changing work is worthy (maybe we can call it metapost 2.0). In fact I don't think it will take too much amount of work as the rewriting can on the go whenever a part is
as said before, the idea is that one uses the lua interface to a rather abstract engine to extend tex; eentually theremight even be less and less core code
move out of the web file. The Java Implementation of TeX, NTS, takes about (2001-1998=) three years to release a beta which have all the structures changed. (Of course, It is interesting that all
actually, nts had still the same concepts as tex and was fully compatible which made the inner structures somewhat suboptimal
implementation of TeX except pdfTeX cannot become the mainstream TeX. Are compatibility the most important issue?).
indeed. compatibility has been a key concept of tex for over 30 years and will remain a key concept; pdftex became a succes because there was a rather strict control over releases (officially once per year, with code freezed months before tex live code freeze) and eventually luatex will end up the same Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi,
an other issue is that patching whatever bit of tex coude should be done very careful; this has been proven by adding a backend ... rather strict control over extensions and changes is needed in order to keep tex's reputation of stability up; a one line change could result in for instance a rounding issue (i mention it because we ran into it some time ago) and can have rather drastic consequences; you don't want that to happen with a machinery that has to reproduce a document at the pixel level
a patch that might work on ones machine might eventually result in many problems all over the world if only because most users don't update frequently and depend on formal distributions (btw the same is true for fonts and other resources ... small changes can have huge consequences)
From my experience LuaTeX's result is very different from pdfTeX's result. The most notable one is that they give me different line breaks even with the same source and tfms given. Last time I was working on a Chinese translation of Karl's TeX for the Impatients, It occured to me that LuaTeX's linebreak is so different from pdfTeX's, and what's worse, LuaTeX gives me more overfull boxes. Of course, I use an old version of LuaTeX distributed in TeXLive 2008 (0.25.x, if memory serves). I doubt whether it will be possible for LuaTeX to
produce the similar result as TeX does, (not to say pixel level). But is compactability that important? XeTeX is not compatible with the good old TeX, but it is still widely used in Asia (Here we don't use LuaTeX simply because there is no LaTeX package support). Knuth also suggested that there are several rounding flaws in recent version of TeX that he cannot change it, but he recommended that modern implementations change that (see TUGBoat, Volume 29 (2008) No.2). Yue Wang
Yue Wang wrote:
Hi,
an other issue is that patching whatever bit of tex coude should be done very careful; this has been proven by adding a backend ... rather strict control over extensions and changes is needed in order to keep tex's reputation of stability up; a one line change could result in for instance a rounding issue (i mention it because we ran into it some time ago) and can have rather drastic consequences; you don't want that to happen with a machinery that has to reproduce a document at the pixel level
a patch that might work on ones machine might eventually result in many problems all over the world if only because most users don't update frequently and depend on formal distributions (btw the same is true for fonts and other resources ... small changes can have huge consequences)
From my experience LuaTeX's result is very different from pdfTeX's result. The most notable one is that they give me different line breaks even with the same source and tfms given. Last time I was working on a Chinese translation of Karl's TeX for the Impatients, It occured to me that LuaTeX's linebreak is so different from pdfTeX's, and what's worse, LuaTeX gives me more overfull boxes. Of course, I use an old version of LuaTeX distributed in TeXLive 2008 (0.25.x, if memory serves). I doubt whether it will be possible for LuaTeX to produce the similar result as TeX does, (not to say pixel level).
we recenetly tested luatex (a more recent version than the one on tex live) with the texbook and there are only a few cases where we see differences, especially when ligatures occur at linebreaks; this is something that will be sorted out; 100% compatibility will never be reached because we use a slightly different route (separation of lig building, kerning, hyphenation and par building) but in practice this should not be a problem so, best use a recent version
But is compactability that important? XeTeX is not compatible with the
it is. of course there will be differences e.g. due to different hyphenation patterns (more complete since we have utf and > 256 chars can be used in patterns), and also because when using font some of traditional tex's limitations are gone (number of distinctive heights and depths) as well as some parameters in open type variants might differ which also leads to different spacing but in general, the expected behaviour of tex remains; there is so much macro code out there that we will not break anything in that respect (unless is can be easily compensated, like some of the removed pdftex primitives)
good old TeX, but it is still widely used in Asia (Here we don't use LuaTeX simply because there is no LaTeX package support). Knuth also
well, it's as with good old tex ... it will take a few years for everyone to catch up; when pdftex came around it took a while too before all features were supported by all macro packages; i'm not too worried about that a too fast change would also backfire on users; for context that's not so much an issue because they is is a beta-update infrastructure and most users are willing to use beta's but with latex you need to keep in mind that there are many users out there who depend on a real stable system (with no experimental stuff that breaks down your document processing every now and then)
suggested that there are several rounding flaws in recent version of TeX that he cannot change it, but he recommended that modern implementations change that (see TUGBoat, Volume 29 (2008) No.2).
maybe roundingproblems are less an issue in cjk fonts, but in for instance arabic typeetting, where many glyphs are pasted together by shifting it really matters; one scaled point might result in a .01bp shift in a pdf file which is visible to the sensitive eye Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen
Yue Wang wrote:
From my experience LuaTeX's result is very different from pdfTeX's result. The most notable one is that they give me different line breaks even with the same source and tfms given. Last time I was working on a Chinese translation of Karl's TeX for the Impatients, It occured to me that LuaTeX's linebreak is so different from pdfTeX's, and what's worse, LuaTeX gives me more overfull boxes. Of course, I use an old version of LuaTeX distributed in TeXLive 2008 (0.25.x, if memory serves). I doubt whether it will be possible for LuaTeX to produce the similar result as TeX does, (not to say pixel level).
we recenetly tested luatex (a more recent version than the one on tex live) with the texbook and there are only a few cases where we see differences, especially when ligatures occur at linebreaks; this is something that will be sorted out; 100% compatibility will never be reached because we use a slightly different route (separation of lig building, kerning, hyphenation and par building) but in practice this should not be a problem
so, best use a recent version
I think that 0.25.x might have the big bad hyphenation bug. However, I suppose even when using a version where this has been sorted out, TeX's behavior for things like shelf{}ful will not be imitated (in this case, TeX separates the ligature unless it does a hyphenation pass on the paragraph (can be forced with \pretolerance=-10000) in which case the ligature will get recombined. That's not something worth imitating. -- David Kastrup
David Kastrup wrote:
I think that 0.25.x might have the big bad hyphenation bug.
It had several of those, in fact.
However, I suppose even when using a version where this has been sorted out, TeX's behavior for things like shelf{}ful will not be imitated (in this case, TeX separates the ligature unless it does a hyphenation pass on the paragraph (can be forced with \pretolerance=-10000) in which case the ligature will get recombined. That's not something worth imitating.
Last time we checked the typeset output (two weeks ago), there were some dozens of differences between pdftex and luatex when compiling the texbook, and these differences were all in one of three categories: * Missed hyphenations. This is a known bug caused by nested discretionaries that will be solved eventually (#9 in the tracker) * Additional hyphenations (in typewriter text). These were added because luatex is willing to hyphenate words that start with a non-letter: luatex just starts the word at the first thing that is a letter. An argument can be made for maintaining compatibility, but currenty we do not intent to make that change as the new behaviour is desired much more often. (the same can be said for hyphenating the first word in a paragraph, but that case never arises in the texbook) * Extra or slightly different ligatures. These are caused by stuff like the self{}ful example above, and this incompatibity will definately remain. TeX's exact behaviour was an implementation artifact that is pretty hard to mimic, and any attempt to do so would lead to extremely ugly code. Best wishes, Taco
Taco Hoekwater
* Additional hyphenations (in typewriter text). These were added because luatex is willing to hyphenate words that start with a non-letter: luatex just starts the word at the first thing that is a letter. An argument can be made for maintaining compatibility, but currenty we do not intent to make that change as the new behaviour is desired much more often. (the same can be said for hyphenating the first word in a paragraph, but that case never arises in the texbook)
Do you have an example where the new behavior would be desired?
* Extra or slightly different ligatures. These are caused by stuff like the self{}ful example above, and this incompatibity will definately remain. TeX's exact behaviour was an implementation artifact that is pretty hard to mimic, and any attempt to do so would lead to extremely ugly code.
Ugly code and ugly/inconsistent results. -- David Kastrup
Taco Hoekwater wrote:
* Additional hyphenations (in typewriter text).
Sorry, I wrote a totally wrong explanation. My brain must have been temporarily clogged over or something. There was incompatibity wrt word boundaries in the beginning and so I just assumed that was still the case. The actual, true reason is that luatex does not listen to \hyphenchar=-1 as a means to prevent hyphenation. I am now reconsidering whether this is a bug or not. Best wishes, Taco
Martin Schröder
2009/1/29 Taco Hoekwater
: The actual, true reason is that luatex does not listen to \hyphenchar=-1 as a means to prevent hyphenation. I am now reconsidering whether this is a bug or not.
At least it's an incompatability (see TeXbook, p.454, 3rd para).
I'd call it a bug since I don't see any situation where ignoring the documented setting would be an advantage. -- David Kastrup
Hi,
so, best use a recent version
Well, I switch to the luatex in the current context minimals(0.31.3, I believe). I have a essay (typesetted in ConTeXt using TeXGyre Palatino font) which only have plain text and math formula. I compile the document using MKIV and MKII, and the linebreak results are very different. Maybe that's because MKIV and MKII are different. I will test some plain TeX document later. Yue Wang
btw, Prof. Knuth think compatibility as a very important issue because
he uses plain TeX.
His book usually calculate the page number/cross reference/index
himself (see TeXBook or Concrete Math).
If the line-break algorithm is changed, he should recalculate all the numbers.
With advanced typesetting macros like ConTeXt or LaTeX, document
complication uses serveral passes and the cross-references are always
right. Moreover, If we cannot make it fully compatible (pixel level),
why should we bother about the issue? Modern users won't care about
them (as Macro packages like ConTeXt are changing more rapidly).
Yue Wang
On Thu, Jan 29, 2009 at 2:24 PM, Yue Wang
Hi,
so, best use a recent version
Well, I switch to the luatex in the current context minimals(0.31.3, I believe). I have a essay (typesetted in ConTeXt using TeXGyre Palatino font) which only have plain text and math formula. I compile the document using MKIV and MKII, and the linebreak results are very different. Maybe that's because MKIV and MKII are different. I will test some plain TeX document later.
Yue Wang
Yue Wang wrote:
Hi,
so, best use a recent version
Well, I switch to the luatex in the current context minimals(0.31.3, I believe). I have a essay (typesetted in ConTeXt using TeXGyre Palatino font) which only have plain text and math formula. I compile the document using MKIV and MKII, and the linebreak results are very different. Maybe that's because MKIV and MKII are different. I will test some plain TeX document later.
this is not a topic for this list so best discuss that at the context list these differences are known (and have been dsicussed there) ... in mkiv we use opentype, don't have the ht/dp limitations and effectively have a slight different ex value which in turn determines a couple of thing like lineheights (btw, in context compatibility is less important than in latex which is often used in journals that need to be reproduced after tens of years) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Maybe that's because MKIV and MKII are different.
Of course that's the reason! What you're observing is the result of the macro package, ConTeXt, not the underlying typesetting engines, pdfTeX and LuaTeX, respectively. Hence, this particular issue is ConTeXt's and does not belong here, as Hans points out. Arthur
later in 2009 we will have another formal release, which opens up a bit more (and the reason for openin gup via lua is that users will use lua for extensions and not so much start patching the core engine written in pascal or c)
Well, I think without a rough understanding of TeX's algorithms, and various specifications (including PDF's and various Font's), mere mortals cannot write the extension code in Lua. The situation in LaTeX world can prove this. XeTeX's success lies in the fact that it is simple and easy to use. Without much knowledge of font, one can use its font mechanism just as other programs like InDesign. So it is easy to write a higher level wrap up for LaTeX (like fontspec). However, the situation for LuaTeX is different. LuaTeX's API is rather complicated, and difficult to understand unless the users have a good understanding of fonts and TeX's internals. Moreover, instead of using ATS or ICU directly, the users should write the layout code themselves which is a huge task. In order to write a fontspec equivalent for LuaTeX, the macro writer should be familiar with - layout algorithm (similar to the one used in ICU) - font structure (especially fontforge API) - lua language (well, he/she should read the Lua Programming Language at least) - TeX language (well, he/she should finish reading the TeXBook several times) - TeX internals (like token, node, font, catcode, hyphenation, line breaking, font expansion, margin kerning, also noad, mlist if the font package deal with math fonts) That might be one of the reasons why there is no such LaTeX package for LuaTeX. In ConTeXt world this is quite different, since Hans Hagen is a Lua+TeX+fontforge+MetaPost+whatever expert. So unless there exist several Hans Hagens in the LaTeX world, ordinary LaTeX users will not benefit from LuaTeX development. Yue Wang
Yue Wang wrote:
later in 2009 we will have another formal release, which opens up a bit more (and the reason for openin gup via lua is that users will use lua for extensions and not so much start patching the core engine written in pascal or c)
Well, I think without a rough understanding of TeX's algorithms, and various specifications (including PDF's and various Font's), mere mortals cannot write the extension code in Lua. The situation in LaTeX world can prove this. XeTeX's success lies in the fact that it is simple and easy to use. Without much knowledge of font, one can use its font mechanism just as other programs like InDesign. So it is easy to write a higher level wrap up for LaTeX (like fontspec). However, the situation for LuaTeX is different. LuaTeX's API is rather complicated, and difficult to understand unless the users have a good understanding of fonts and TeX's internals. Moreover, instead of using ATS or ICU directly, the users should write the layout code themselves which is a huge task. In order to write a fontspec equivalent for LuaTeX, the macro writer should be familiar with
well, but even then it's just a (few) macro writer(s) and not all users; also keep in mind that in 30 years of tex, fonts have never been easy, always demanded some expertise, etc and there was always a small group (per macro package) that dealt with it also, we are aware of xetex being easier to use out-of-the-box with respect to fonts, which is great; we definitely do not advocate users to use luatex instead of xetex, they should use what fits best and it might be that 90% of the users is better of with xetex
- layout algorithm (similar to the one used in ICU)
well, use xetex then; there will be no hard codes layout extensions; for instance, as part of the oriental tex project an alternative line break routine is written (a subproject by idris, taco and me) and that's all done in lua
- font structure (especially fontforge API)
sure, but in any case one should be familiar with what a font is; this is normally not something that an average user will deal with (and once a macro package supports something, it's even more hidden and stays forever)
- lua language (well, he/she should read the Lua Programming Language at least)
that's not too much work as we've chosen a language which is relatively easy to learn and is not burdened by tons of libraries that you need to keep up with (apart from installation mess that would result from that)
- TeX language (well, he/she should finish reading the TeXBook several times)
it depens on what one does
- TeX internals (like token, node, font, catcode, hyphenation, line breaking, font expansion, margin kerning, also noad, mlist if the font package deal with math fonts)
That might be one of the reasons why there is no such LaTeX package for LuaTeX. In ConTeXt world this is quite different, since Hans Hagen is a Lua+TeX+fontforge+MetaPost+whatever expert. So unless there exist several Hans Hagens in the LaTeX world, ordinary LaTeX users will not benefit from LuaTeX development.
well, eventually it will happen i guess; just look at it from the other end ... when i started writing context i didn't have a clue what tex was doing, and i simply didn't understand enough of tex to see what happened in latex code (we're talking 1992 or so); eventually context catched up quite well; this time context happens to be a bit ahead with respect to luatex btw, in pdftex similar things happened ... context has been using some features of pdftex before latex did simply because i was involved in the development, such is live Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Thu, Jan 29, 2009 at 01:02:26PM +0800, Yue Wang wrote:
for LuaTeX. In ConTeXt world this is quite different, since Hans Hagen is a Lua+TeX+fontforge+MetaPost+whatever expert. So unless there exist several Hans Hagens in the LaTeX world, ordinary LaTeX users will not benefit from LuaTeX development.
That is why I believe that we should have some means to share must of the lower level Lua code across macro packages. For instance writing 2 or 3 OpenType layout modules is a waste of time, is it a big task of its own especially if you want to cover all scripts, there exists 3 free OpenType implementations and only one gets it almost right. Regards, Khaled -- Khaled Hosny Arabic localizer and member of Arabeyes.org team
Khaled Hosny wrote:
On Thu, Jan 29, 2009 at 01:02:26PM +0800, Yue Wang wrote:
for LuaTeX. In ConTeXt world this is quite different, since Hans Hagen is a Lua+TeX+fontforge+MetaPost+whatever expert. So unless there exist several Hans Hagens in the LaTeX world, ordinary LaTeX users will not benefit from LuaTeX development.
That is why I believe that we should have some means to share must of the lower level Lua code across macro packages. For instance writing 2 or 3 OpenType layout modules is a waste of time, is it a big task of its
it depends ... in that respect writing two or more macro packages is also a waste of time (or even different variants of tables within a macro package etc) concerning writing such an engine ... for me it's part of the fun (esp in relation to the oriental tex project) and i'm kind of glad that in context we can do it the way that suits context best (which in some aspects can be rather different from the way latex does things; for instance, we provide extra features on top of what the font does, have rather context specific tracing options, etc); personally i start from what users which kinds of determines the order of development (so, currently arabic and a bit of cjk)
own especially if you want to cover all scripts, there exists 3 free OpenType implementations and only one gets it almost right.
as i've said before, at some point i will provide a kind of generic variant of the context open type support, but currently my priority lays with getting the oriental projects related font support done (which is an interplay between font design and handling features) [so, any generic context code would be provided as-is and development driven via the context related lists]; with the luatex project we're not that much in a hurry anyway: we reimplement parts of tex, honouring the basic design of tex, adding a few things only, and opening things up step by step; after all, we have working pdftex's and xetex's so there is not that much need to hurry, cook up half solutions, end up in incompatible patch mode etc also, even if much of the low level context mkiv lua code can be seen as generic i foresee many problems in reuse simply bcause macro packages differ in fundamental ways (otherwise there would not be different macro packages at all); also, keep in mind that latex has functionality split up over many packages, where redefining low level code is happening, which in turn does not really help] Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Thu, Jan 29, 2009 at 11:17:27AM +0100, Hans Hagen wrote:
Khaled Hosny wrote:
On Thu, Jan 29, 2009 at 01:02:26PM +0800, Yue Wang wrote:
for LuaTeX. In ConTeXt world this is quite different, since Hans Hagen is a Lua+TeX+fontforge+MetaPost+whatever expert. So unless there exist several Hans Hagens in the LaTeX world, ordinary LaTeX users will not benefit from LuaTeX development.
That is why I believe that we should have some means to share must of the lower level Lua code across macro packages. For instance writing 2 or 3 OpenType layout modules is a waste of time, is it a big task of its
it depends ... in that respect writing two or more macro packages is also a waste of time (or even different variants of tables within a macro package etc)
Somehow it is ;), people usually have different needs and thus different macro packages, but I wonder why someone would need very different low level OpenType implementation, but I may be wrong.
own especially if you want to cover all scripts, there exists 3 free OpenType implementations and only one gets it almost right.
as i've said before, at some point i will provide a kind of generic variant of the context open type support, but currently my priority lays with getting the oriental projects related font support done (which is an interplay between font design and handling features) [so, any generic context code would be provided as-is and development driven via the context related lists]; with the luatex project we're not that much in a hurry anyway: we reimplement parts of tex, honouring the basic design of tex, adding a few things only, and opening things up step by step; after all, we have working pdftex's and xetex's so there is not that much need to hurry, cook up half solutions, end up in incompatible patch mode etc
But luatex has such great features that I want to use with other macro packages, I want to typeset some Arabic technical documents with Texinfo for example. May point is that, it would be great if we can have a kind of reference implementations of some of the very need functionality like OpenType, Unicode BiDi algorithm, Indic reordering and such, in a macro independent way that one can plug anywhere and write some higher level macro support on top of it. If a macro package don't feel like using it, they can write their own of course. I can argue this though, I'm not the one who writes the code here. Regards, Khaled -- Khaled Hosny Arabic localizer and member of Arabeyes.org team
Khaled Hosny wrote:
Somehow it is ;), people usually have different needs and thus different macro packages, but I wonder why someone would need very different low level OpenType implementation, but I may be wrong.
well, first of all i need the exercise of writing the machinery in order to understand open type; also i often want to add extra visualization stuff for manuals and such;
own especially if you want to cover all scripts, there exists 3 free OpenType implementations and only one gets it almost right. as i've said before, at some point i will provide a kind of generic variant of the context open type support, but currently my priority lays with getting the oriental projects related font support done (which is an interplay between font design and handling features) [so, any generic context code would be provided as-is and development driven via the context related lists]; with the luatex project we're not that much in a hurry anyway: we reimplement parts of tex, honouring the basic design of tex, adding a few things only, and opening things up step by step; after all, we have working pdftex's and xetex's so there is not that much need to hurry, cook up half solutions, end up in incompatible patch mode etc
But luatex has such great features that I want to use with other macro packages, I want to typeset some Arabic technical documents with Texinfo for example. May point is that, it would be great if we can have a kind of reference implementations of some of the very need functionality like OpenType, Unicode BiDi algorithm, Indic reordering and such, in a macro independent way that one can plug anywhere and write some higher level macro support on top of it. If a macro package don't feel like using it, they can write their own of course. I can argue this though, I'm not the one who writes the code here.
well, maybe eventually the mkiv code can serve that purpose, but as said before, once it all works ok, i'll split modules and cook up some minimal system (this is also part of a more layered context mkiv) (btw, macro package independent way is non trivial, just look at how much independent stuff is there; long ago i made supp-pdf and even for something small and isolated like that latex interference problems kept coming up so eventually i ended up with stripping some code and having a more extended module for context) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Khaled Hosny
Somehow it is ;), people usually have different needs and thus different macro packages, but I wonder why someone would need very different low level OpenType implementation, but I may be wrong.
I think OpenType is more complicated than people tend to realize, and less amenable to the 'black box' approach than one might think. I'm judging from how well applications handle _Latin_ script, which supposedly is simple. It's better to have Lua code that can be customized easily.
Le 1 févr. 09 à 02:11, Barry Schwartz a écrit :
I think OpenType is more complicated than people tend to realize, and
Complicated and also quite different from the TeX ligkern model. Also let us not forget that OpenType is aimed for a GUI where you can activate and deactivate properties, and see what happens. The result of a bunch of combined OpenType transformations is not always trivial to predict, but when you are on a GUI you don't care about that, you just see what happens. -- + -----------------------------------------------------------------------+ | Yannis Haralambous, Ph.D. yannis.haralambous@telecom- bretagne.eu | | Directeur d'Études http://omega.enstb.org/ yannis | | Tel. +33 (0)2.29.00.14.27 | | Fax +33 (0)2.29.00.12.82 | | Département Informatique | | Télécom Bretagne | | Technopôle de Brest Iroise, CS 83818, 29238 Brest Cedex 3, France | | Coordonnées Google-Earth : 48°21'31.57"N 4°34'16.76"W | + -----------------------------------------------------------------------+ ...pour distinguer l'extérieur d'un aquarium, mieux vaut n'être pas poisson ...the ball I threw while playing in the park has not yet reached the ground Es gab eine Zeit, wo ich nur ungern über Schubert sprechen, nur Nächtens den Bäumen und Sternen von ihm vorerzählen mögen.
Hi again, Yue Wang wrote:
So I think the huge changing work is worthy (maybe we can call it metapost 2.0). In fact I don't think it will take too much amount of work as the rewriting can on the go whenever a part is move out of the web file.
Actually, the pascal web portion of luatex is steadily shrinking. If the pascal->C conversion rate stays as it is now, all traces of pascal will be gone by the summer (but it may take a little longer, please don't shoot me if I don't live up to the deadline). Once that is done, the C files will be converted back into Cweb files (this may take quite some time itself, depending on how much source layout reorganisation needs to be done). Best wishes, Taco
To be honest, I have no idea how many of those things can be done by freetype2, but I highly doubt its OTF support will be sufficient.
As a matter of fact, while FreeType 1 came with a few utilities demonstrating basic OpenType capabilities, FreeType 2 dropped them entirely, and it seems, from browsing the documentation, that there there is not even an OpenType-specific API in the latter (whereas there is one for BDF, PCF, PFR fonts ...) I've always wondered about this fact, but couldn't find any reason for it. If Werner Lemberg reads the list, maybe he can enlighten us. Arthur
Arthur Reutenauer
As a matter of fact, while FreeType 1 came with a few utilities demonstrating basic OpenType capabilities, FreeType 2 dropped them entirely, and it seems, from browsing the documentation, that there there is not even an OpenType-specific API in the latter (whereas there is one for BDF, PCF, PFR fonts ...) I've always wondered about this fact, but couldn't find any reason for it.
If Werner Lemberg reads the list, maybe he can enlighten us.
There is no real OT support in FreeType2. If you want kerning you'd better have a kern table. :)
And the advantage of using FontForge could also be to have code for modifying glyphs and saving them in new fonts... Le 28 janv. 09 à 20:54, Barry Schwartz a écrit :
Arthur Reutenauer
skribis: As a matter of fact, while FreeType 1 came with a few utilities demonstrating basic OpenType capabilities, FreeType 2 dropped them entirely, and it seems, from browsing the documentation, that there there is not even an OpenType-specific API in the latter (whereas there is one for BDF, PCF, PFR fonts ...) I've always wondered about this fact, but couldn't find any reason for it.
If Werner Lemberg reads the list, maybe he can enlighten us.
There is no real OT support in FreeType2. If you want kerning you'd better have a kern table. :)
_______________________________________________ dev-luatex mailing list dev-luatex@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-luatex
-- + -----------------------------------------------------------------------+ | Yannis Haralambous, Ph.D. yannis.haralambous@telecom- bretagne.eu | | Directeur d'Études http://omega.enstb.org/ yannis | | Tel. +33 (0)2.29.00.14.27 | | Fax +33 (0)2.29.00.12.82 | | Département Informatique | | Télécom Bretagne | | Technopôle de Brest Iroise, CS 83818, 29238 Brest Cedex 3, France | | Coordonnées Google-Earth : 48°21'31.57"N 4°34'16.76"W | + -----------------------------------------------------------------------+ ...pour distinguer l'extérieur d'un aquarium, mieux vaut n'être pas poisson ...the ball I threw while playing in the park has not yet reached the ground Es gab eine Zeit, wo ich nur ungern über Schubert sprechen, nur Nächtens den Bäumen und Sternen von ihm vorerzählen mögen.
Hi,
There is no real OT support in FreeType2. If you want kerning you'd better have a kern table. :)
I think OT features are easy to implement? I use LCDF-TypeTool (which is a very small program, and does not depend on any font library) and it is easy to convert OpenType fonts with arbitrary features I want to TFM and Type1 fonts. btw, There is a small tool called otfdump (which uses a very small and cpu/memory efficient library, libotf), and it can dump ttf and otf files (just as how fontforge is used in LuaTeX). dumping TeXGyre Heros Regular only takes 0.058 second, and Adobe Song Std Light uses 0.140 second. If we use this library, I think there is no need to implement a font cache system in ConTeXt---since it is very memory efficient and fast. If I am wrong, please point out, I am a font newbie and want to learn more Yue Wang
Yue Wang wrote:
If I am wrong, please point out, I am a font newbie and want to learn more
if you only have ligatures and kerning then you can indeed use simple approached, but there happens to be more (indeed caching is not always needed and if i cook up a generic subset caching will not be part of the game at all because performance is the least of my concerns then) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Wed, Jan 28, 2009 at 08:27:07PM +0100, Arthur Reutenauer wrote:
To be honest, I have no idea how many of those things can be done by freetype2, but I highly doubt its OTF support will be sufficient.
As a matter of fact, while FreeType 1 came with a few utilities demonstrating basic OpenType capabilities, FreeType 2 dropped them entirely, and it seems, from browsing the documentation, that there there is not even an OpenType-specific API in the latter (whereas there is one for BDF, PCF, PFR fonts ...) I've always wondered about this fact, but couldn't find any reason for it.
If Werner Lemberg reads the list, maybe he can enlighten us.
AFAIK, they felt that OpenType layout should be handled by a higher level layer, that happen to be HarfBuzz (which is based on FreeType1 OT code). http://www.freedesktop.org/wiki/Software/HarfBuzz Regards, -- Khaled Hosny Arabic localizer and member of Arabeyes.org team
I didn't know that HarfBuzz was supposed to be a Persian word :-) I knew what it was, of course, but I had no idea where the name came from. And thanks for the explanations! Arthur
participants (10)
-
Arthur Reutenauer
-
Barry Schwartz
-
David Kastrup
-
Hans Hagen
-
Hans van der Meer
-
Khaled Hosny
-
Martin Schröder
-
Taco Hoekwater
-
Yannis Haralambous
-
Yue Wang