hi Taco: - when I change something in fontforge/Unicode/*, and run build.sh --make, it will recompile many luatex stuffs. That is not necessary. please fix that if you can. - cjk.c can be removed completely. Chinese TTF/OTF Fonts are arranged in unicode order (maybe with other encoding charmap provided). So no need to do font re-encoding. (I think Japanese fonts and Korean fonts do too) - source/texk/web2c/luatexdir/luafontloader/fontforge/Unicode/backtrns.c is not needed, please remove that. - maybe it is not necessary to extend ctype to utype: in ConTeXt we have char-def.lua which gives very detailed information. After browsing the source code I think there is almost no dependency on unicode range for isxxxx and tolower/toupper. Most are for conversion of file names on local filesystem and so on. after removing utype.[ch], only two APIs ishexdigit and iscombinedchar are missing. but this two is very easy to implement. e.g. hexdigit = 0-9, a-f. - unialt.c is only needed for autohint.c. since hinting have nothing to do with typesetting, perhaps these two files can be gone too... After removing these files I ended up building a 3.9M luatex on Mac OS X. Maybe Linux binary can be even smaller. of course, the above thoughts have not been throughly tested. Yue Wang
After copy the stripped down luatex binary into context tree and test
a few documents,
I am quite happy with my change.
btw, taco, some text blocks are placed in the wrong places in the pdf
using the unchanged version. known issue?
On Fri, Jun 19, 2009 at 9:38 PM, Yue Wang
hi Taco:
- when I change something in fontforge/Unicode/*, and run build.sh --make, it will recompile many luatex stuffs. That is not necessary. please fix that if you can.
- cjk.c can be removed completely. Chinese TTF/OTF Fonts are arranged in unicode order (maybe with other encoding charmap provided). So no need to do font re-encoding. (I think Japanese fonts and Korean fonts do too)
- source/texk/web2c/luatexdir/luafontloader/fontforge/Unicode/backtrns.c is not needed, please remove that.
- maybe it is not necessary to extend ctype to utype: in ConTeXt we have char-def.lua which gives very detailed information. After browsing the source code I think there is almost no dependency on unicode range for isxxxx and tolower/toupper. Most are for conversion of file names on local filesystem and so on. after removing utype.[ch], only two APIs ishexdigit and iscombinedchar are missing. but this two is very easy to implement. e.g. hexdigit = 0-9, a-f.
- unialt.c is only needed for autohint.c. since hinting have nothing to do with typesetting, perhaps these two files can be gone too...
After removing these files I ended up building a 3.9M luatex on Mac OS X. Maybe Linux binary can be even smaller.
of course, the above thoughts have not been throughly tested.
Yue Wang
Yue Wang wrote:
After copy the stripped down luatex binary into context tree and test a few documents, I am quite happy with my change.
more on those changes later
btw, taco, some text blocks are placed in the wrong places in the pdf using the unchanged version. known issue?
that should have been fixed by #2529. If not, can you send me a test file please? Best wishes, Taco
Hi Yue Wang, Yue Wang wrote:
hi Taco:
- when I change something in fontforge/Unicode/*, and run build.sh --make, it will recompile many luatex stuffs. That is not necessary. please fix that if you can.
Sorry, I can't fix that (at least not right now). The dependencies are auto-generated and luatex's C library as a whole depends on libff.a.
- cjk.c can be removed completely. Chinese TTF/OTF Fonts are arranged in unicode order (maybe with other encoding charmap provided). So no need to do font re-encoding. (I think Japanese fonts and Korean fonts do too)
I have not actually removed the source code (just in case there is a problem discovered later) but I have completely hidden it from the compiler so that it is no longer compiled in the binary. So, Yanrui Li (and maybe for you as well, just to verify I did not mess up anything): if you want to run your tests, you only have to grab the current trunk and recompile. Probably the most important thing to test is whether searching in Acroread still works as it should.
- source/texk/web2c/luatexdir/luafontloader/fontforge/Unicode/backtrns.c is not needed, please remove that.
Done. I also removed the dump.c file that is used to generate some of these support data files.
- maybe it is not necessary to extend ctype to utype: in ConTeXt we have char-def.lua which gives very detailed information.
I have decided to keep that code: at some time in the future I want to expose the fontforge Unicode library to the lua scripting language. The current unicode library (slunicode) is minimalistic, already outdated, and hard to keep up-to-date, so it makes sense to switch to the much cleaner version from Fontforge at some point (not too soon though, it has a rather low priority).
- unialt.c is only needed for autohint.c. since hinting have nothing to do with typesetting, perhaps these two files can be gone too...
Autohint.c (and tocff.c) is really needed: for some odd legacy fonts, I generate a CFF font on the fly. But unialt.c was is used only for the FindBlues() function, and for that, the test for unicode alternates was definately overkill, so unialt.c is gone now.
After removing these files I ended up building a 3.9M luatex on Mac OS X. Maybe Linux binary can be even smaller.
The size of my cross-compiled windows binary dropped by some 750K thanks to all this. I can't easily check linux binary sizes because I always compile with debugging symbols on and optimization off (except for releases).
of course, the above thoughts have not been throughly tested.
Best wishes, thanks for the digging up the information, Taco
Taco Hoekwater wrote:
if you want to run your tests, you only have to grab the current trunk and recompile. Probably the most important thing to test is whether searching in Acroread still works as it should.
Just in case, for those of you that you use context: don't forget to empty the font cache first. All this stuff only applies to the initial font loading stage. Best wishes, Taco
Hi, Taco:
With more extensive test on Chinese and Korean fonts (40 or so fonts)
by Li Yanrui, Wang Longming, and me, we encountered no font
loading/embedding problem.
PDFs can still copy and paste correctly.
So this is a quite good change.
No idea on Japanese fonts (I don't speak Japanese).
But KozMinPr6N-Regular.otf (The only Japanese fonts Li Yanrui have) works.
I think Japanese users on dev-luatex list can tell more about this change.
Yue Wang
On Fri, Jun 19, 2009 at 11:19 PM, Taco Hoekwater
Taco Hoekwater wrote:
if you want to run your tests, you only have to grab the current trunk and recompile. Probably the most important thing to test is whether searching in Acroread still works as it should.
Just in case, for those of you that you use context: don't forget to empty the font cache first. All this stuff only applies to the initial font loading stage.
Best wishes, Taco
I have decided to keep that code: at some time in the future I want to expose the fontforge Unicode library to the lua scripting language. The current unicode library (slunicode) is minimalistic, already outdated, and hard to keep up-to-date, so it makes sense to switch to the much cleaner version from Fontforge at some point (not too soon though, it has a rather low priority).
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call. It is not needed to be in full unicode range.
Hi, Taco:
On Sat, Jun 20, 2009 at 12:20 AM, Yue Wang
I have decided to keep that code: at some time in the future I want to expose the fontforge Unicode library to the lua scripting language. The current unicode library (slunicode) is minimalistic, already outdated, and hard to keep up-to-date, so it makes sense to switch to the much cleaner version from Fontforge at some point (not too soon though, it has a rather low priority).
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call. It is not needed to be in full unicode range.
here is the patch. Index: source/texk/web2c/luatexdir/luafontloader/fontforge/Unicode/utype.c =================================================================== --- source/texk/web2c/luatexdir/luafontloader/fontforge/Unicode/utype.c (revision 2540) +++ source/texk/web2c/luatexdir/luafontloader/fontforge/Unicode/utype.c (working copy) @@ -1,5 +1,6 @@ #include "utype.h" +#if 0 const unsigned short ____tolower[]= { 0, 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f, @@ -8195,7 +8196,6 @@ 0x0000, 0xfff9, 0xfffa, 0xfffb, 0xfffc, 0xfffd, 0x0000, 0x0000 }; -#if 0 const unsigned short ____toupper[] = { 0, 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f, Index: source/texk/web2c/luatexdir/luafontloader/fontforge/fontforge/macbinary.c =================================================================== --- source/texk/web2c/luatexdir/luafontloader/fontforge/fontforge/macbinary.c (revision 2540) +++ source/texk/web2c/luatexdir/luafontloader/fontforge/fontforge/macbinary.c (working copy) @@ -1155,7 +1155,7 @@ spt = strrchr(buffer,'/')+1; for ( pt=spt; *pt; ++pt ) if ( isupper( *pt )) - *pt = tolower( *pt ); + *pt = *pt - 'A' + 'a'; dpt = strchr(spt,'.'); if ( dpt==NULL ) dpt = spt+strlen(spt); if ( dpt-spt>8 || strlen(dpt)>4 ) { Index: source/texk/web2c/luatexdir/luafontloader/fontforge/inc/utype.h =================================================================== --- source/texk/web2c/luatexdir/luafontloader/fontforge/inc/utype.h (revision 2540) +++ source/texk/web2c/luatexdir/luafontloader/fontforge/inc/utype.h (working copy) @@ -47,14 +47,14 @@ #define ____TOUCHING 0x100000 #define ____COMBININGPOSMASK 0x1fff00 +#if 0 extern const unsigned short ____tolower[]; -#if 0 extern const unsigned short ____toupper[]; #endif extern const unsigned int ____utype[]; +#if 0 #define tolower(ch) (____tolower[(ch)+1]) -#if 0 #define toupper(ch) (____toupper[(ch)+1]) #endif #define islower(ch) (____utype[(ch)+1]&____L) (and personally I think ____utype can be gone too... such unicode library can be very easy to be implement in pure Lua way.) Yue Wang
Yue Wang wrote:
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call.
It is also used by the strmatch() function collection in Unicode/char.c, which themselves are used in various places all over the source. I have applied the patch for now (after checking all actual usages of those functions to make sure they do not need unicode) but I hope you see why this gets problematic? I will have to revert it back at the first instance of actual unicode strings that need to compared. Best wishes, Taco
On Sat, Jun 20, 2009 at 3:15 PM, Taco Hoekwater
Yue Wang wrote:
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call.
It is also used by the strmatch() function collection in Unicode/char.c, which themselves are used in various places all over the source.
I have applied the patch for now (after checking all actual usages of those functions to make sure they do not need unicode) but I hope you see why this gets problematic? I will have to revert it back at the first instance of actual unicode strings that need to compared.
well, that's ok. 200K size can be ignored since now 500G hard drive is pretty cheap...
Best wishes, Taco
On Sat, Jun 20, 2009 at 09:15:10AM +0200, Taco Hoekwater wrote:
Yue Wang wrote:
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call.
It is also used by the strmatch() function collection in Unicode/char.c, which themselves are used in various places all over the source.
I have applied the patch for now (after checking all actual usages of those functions to make sure they do not need unicode) but I hope you see why this gets problematic? I will have to revert it back at the first instance of actual unicode strings that need to compared.
SVN revision 2541 doesn't even work for me, I get a FontForge error whenever I run luatex, even with no files at all: FontForge does not support your encoding (UTF-8), it will pretend the local encoding is latin1 Internal Error: I can't figure out your version of iconv(). I need a name for the UCS-4 encoding and I can't find one. Reconfigure --without-iconv. Bye. Regards, Khaled -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer
that's because Taco's version of string match is buggy.
On Sat, Jun 20, 2009 at 4:38 PM, Khaled Hosny
On Sat, Jun 20, 2009 at 09:15:10AM +0200, Taco Hoekwater wrote:
Yue Wang wrote:
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call.
It is also used by the strmatch() function collection in Unicode/char.c, which themselves are used in various places all over the source.
I have applied the patch for now (after checking all actual usages of those functions to make sure they do not need unicode) but I hope you see why this gets problematic? I will have to revert it back at the first instance of actual unicode strings that need to compared.
SVN revision 2541 doesn't even work for me, I get a FontForge error whenever I run luatex, even with no files at all:
FontForge does not support your encoding (UTF-8), it will pretend the local encoding is latin1 Internal Error: I can't figure out your version of iconv(). I need a name for the UCS-4 encoding and I can't find one. Reconfigure --without-iconv. Bye.
Regards, Khaled
-- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAko8oB4ACgkQRoqITGOuyPJ63wCeKogYEqRwy04Vl0dOCSpmXWBB Re8AoInlrCUQjBMEDVvZaBhoatasoCZe =+3x1 -----END PGP SIGNATURE-----
no, I find out that my version of tolower is buggy...
lower case can not be lower case again.
for macbinary.c it's ok, but not for strmatch.
On Sat, Jun 20, 2009 at 4:56 PM, Yue Wang
that's because Taco's version of string match is buggy.
On Sat, Jun 20, 2009 at 4:38 PM, Khaled Hosny
wrote: On Sat, Jun 20, 2009 at 09:15:10AM +0200, Taco Hoekwater wrote:
Yue Wang wrote:
OK. I understand. but can you put tolower into #ifdef too? tolower is only needed for macbinary.c for a filename related call.
It is also used by the strmatch() function collection in Unicode/char.c, which themselves are used in various places all over the source.
I have applied the patch for now (after checking all actual usages of those functions to make sure they do not need unicode) but I hope you see why this gets problematic? I will have to revert it back at the first instance of actual unicode strings that need to compared.
SVN revision 2541 doesn't even work for me, I get a FontForge error whenever I run luatex, even with no files at all:
FontForge does not support your encoding (UTF-8), it will pretend the local encoding is latin1 Internal Error: I can't figure out your version of iconv(). I need a name for the UCS-4 encoding and I can't find one. Reconfigure --without-iconv. Bye.
Regards, Khaled
-- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAko8oB4ACgkQRoqITGOuyPJ63wCeKogYEqRwy04Vl0dOCSpmXWBB Re8AoInlrCUQjBMEDVvZaBhoatasoCZe =+3x1 -----END PGP SIGNATURE-----
Yue Wang wrote:
no, I find out that my version of tolower is buggy... lower case can not be lower case again. for macbinary.c it's ok, but not for strmatch.
On Sat, Jun 20, 2009 at 4:56 PM, Yue Wang
wrote: that's because Taco's version of string match is buggy.
On Sat, Jun 20, 2009 at 4:38 PM, Khaled Hosny
wrote: SVN revision 2541 doesn't even work for me, I get a FontForge error whenever I run luatex, even with no files at all:
#2542 simply reverts #2451. Best wishes, Taco
Hi, Taco and Khaled
#2542 simply reverts #2451.
first of all, you left two typos there, so it will simply not work: what I said is *pt = *pt - 'A' + 'a'; // to lower however, you write it like that: #define tolower(ch) (ch+'A'-'a') it is a "toupper" statement. moreover, the statement I left in macbinary.c is safe since there is a "isupper" to do the test. however, it's totally wrong for you to put the same stuff to utype.h. utype.h should check whether it is in the [A-Z] range or not. So this is not because my patch sucks, but because you wrote the wrong statement... Here is the patch for 2451: Index: source/texk/web2c/luatexdir/luafontloader/fontforge/inc/utype.h =================================================================== --- source/texk/web2c/luatexdir/luafontloader/fontforge/inc/utype.h (revision 2541) +++ source/texk/web2c/luatexdir/luafontloader/fontforge/inc/utype.h (working copy) @@ -58,7 +58,7 @@ #define toupper(ch) (____toupper[(ch)+1]) #else /* ASCII style */ -#define tolower(ch) (ch+'A'-'a') +#define tolower(ch) ((ch >= 'A' && ch <= 'Z') ? ch + 32: ch) #endif #define islower(ch) (____utype[(ch)+1]&____L) #define isupper(ch) (____utype[(ch)+1]&____U) can do all the trick.
Best wishes, Taco
Yue Wang
Yue Wang wrote:
Hi, Taco and Khaled
#2542 simply reverts #2451.
first of all, you left two typos there, so it will simply not work:
Yes, I know I messed up your patch and that it was not your fault, but nevertheless I changed my mind and now prefer to keep using the unicode version of tolower(). The reason is this: any imports I make from newer versions of fontforge (and that is likely) will assume the unicode version of tolower to be present. Using a different definition could result in bugs that are very hard to find. Best wishes, Taco
participants (3)
-
Khaled Hosny
-
Taco Hoekwater
-
Yue Wang