Hi, Taco:
Thanks very much for the explanation, and I agree with you that
removing packed data and rewrite the memory allocation parts will take
huge amount of work as it is the structures that most code built upon,
and many TeX implementations do not touch that (including Common TeX
which is a hand-conversion of Knuth's TeX, and your original cxtex
which is a hand-conversion of pdfTeX). As far as I know, only NTS and
its successor, ExTeX have modernized TeX's data structures. But what's
the benefit of converting TeX and MP's web source code into c or cweb
if the old-fashioned data structures and algorithms remain unchanged?
And Getting rid of the web2c step and making debugging/extension much
easier are the ultimate goals for the C version? I doubt not.
I think the main reason for TeX's development is not so active as
other famous software projects is that TeX's program is not easy to
read, and only those experienced programmer with highest level of
programming ability (like Taco, of course) know how to write its code.
During LuaTeX's development I found several bugs in LuaTeX but I don't
know how to debug the program and send you the patches. I felt
depressed since I could not understand many parts of the code although
TeX's source code have the best quality documentation. Other projects
like Lua may not have good documentation and source code comments, but
their code is very clear as parser/lexer/various
libs/compiler/interpreter are separated so I can start debugging or
writing extensions after skimming two or three of the source code
files. But in the TeX world things are quite another, All parts in the
program are depended with each other, and one cannot understand some
parts if he had not read the previous parts.
Moreover, as I have pointed out, TeX's data structure and memory
management are not friendly to the newcomer. I start reading TeX: The
Program three years ago (As a university freshman at that time, I only
know how to program in Pascal which is the official language for
NOI[National Olympiad in Informatics], and know a little bit about
data structure and computer algorithms like searching, hashing, and
simple graph algorithms) since some guys recommended it to me saying
that the pascal program is written by Don Knuth. However, after
reading several Parts, I get confused, not because the strange web
notation, but the complicated macros. So I stopped learning the code
at the first try. Then I learned the C programming language when I was
studying in the University and it become my major language, and today
I almost forget about the Pascal language. I was excited when I heard
about the road map of the LuaTeX projects since it say it will
eventually convert all the pascal web code into C in 2010 (2007
version, Taco changed the road map several times during the years. As
I remember right, the first stable version would be released in 2007,
then tabled in 2008, and now postponed to 2009) because I think many
old parts can be replaced and make the program more readable to the
user. However, after reading the cweb version of MetaPost, I still
find understanding the code is somewhat difficult, not to say
debugging/extending it (of course, better than the first try three
years earlier).
So I think the irreplaceable significance of changing the original
TeX's WEB version into C version is to enable TeX as a successful Open
Source projects that more people can join and study its source code,
learn how to design a typography software from scratch, then extend
related algorithms. So I think the huge changing work is worthy (maybe
we can call it metapost 2.0). In fact I don't think it will take too
much amount of work as the rewriting can on the go whenever a part is
move out of the web file. The Java Implementation of TeX, NTS, takes
about (2001-1998=) three years to release a beta which have all the
structures changed. (Of course, It is interesting that all
implementation of TeX except pdfTeX cannot become the mainstream TeX.
Are compatibility the most important issue?).
Just my two cents, Best wishes to you.
Yue Wang
On Wed, Jan 28, 2009 at 5:22 PM, Taco Hoekwater
Yue Wang wrote:
Hi, Taco:
Last week I stayed in my grandma's home (located in a beautiful countryside) which has limited internet access. So I planned to study the luatex source code during the stay.
Shouldn't you have been out enjoying that same countryside then? ;)
- wouldn't it be great if we remove the packed data dependency?
Yes, it would be better. It entails a lot of work though, even for relatively small program like mplib, and there are not only advantages.
since we can get rid of the messy web or C macros.
We would be able to get rid of the macros, but we will have to introduce lots of typecasts or pointer indirections depending on whether the node structure will be a heterogenous union or a listhead + data. I have gone through that process for the exported image backend in mplib (which uses unions, see e.g. psout.w), and it was a fair amount of work even for that small chunk of code.
I am not saying that it shouldn't be done, just that it is not something that can be undertaken lightly. I already intend to make metapost's knots and knot lists use truly dynamic memory (a relatively minor task) but I have to find a quiet time.
In luatex the problem is more complicated because pascal&web2c does not like pointers much, so nothing is planned there until after the switch from pascal to C(web).
-Why can't we use IEEE floating number specification?
Please do an internet search for "megapost metapost", and you will find some posts and presentations on the subject (most written by me).
Short summary: future versions of MPlib will incorporate either the MPFR library ( http://www.mpfr.org/ ) or decNumber ( http://www.alphaworks.ibm.com/tech/decnumber ).
I have yet to decide which one, both have their advantages.
TeX/MetaPost have it's own number presentation builtin. But this is a very complicated and strange floating number system which represent each number as a integer. I think IEEE floating number specification is good enough for implementing TeX?
Not really, because there are portability issues wrt rounding. the normal C "double" data type is not acceptable without an additional library to make it behave reliably. And for MP, double is not really precise enough anyway, so it is better jump to something more serious.
remove the dependency of part 7 and part 9, maybe we can also make our code more portable on different machines.
TeX's do-it-yourself integer calculus is the most portable you can get.
- Why should we do memory management ourselves?
It is still there mostly for the same reason as mentioned above for the packed memory: it is a lot of work to change it.
Wouldn't it be great if we just ask the operating system's C library to handle these tasks (like using malloc and free as the example code of mp_new_fill_node shows). Modern operating system's library is more efficient, and it will also make the luatex code looks better.
This is true for mp.w, but I am not so sure that system malloc() will perform better that texnodes.c (which essentially uses 10 dedicated avail lists). Anyway, see the remarks above.
- From the part of the code I read, if I understand correctly, the only reason of incorporating fontforge is to get the font Metrics data?
Not only that. Stuff that needs doing:
* parsing of Type1 & Type0 (CID composite) fonts * parsing of sfnt containers (ttc and dfont) * parsing of TTF and OTF fonts (including AAT) * bounding box calculations for PostScript-based glyphs (because this information is not in the font itself) * parsing and processing of GPOS & GSUB tables * font reencoding * converting to all of this to a lua table
To be honest, I have no idea how many of those things can be done by freetype2, but I highly doubt its OTF support will be sufficient.
If so, I think FreeType2 is sufficient to accomplish the task? [I think freetype2 is much smaller and efficient, and fontforge use freetype2]