[luatex-fonts] non-ascii filenames in font cache
![](https://secure.gravatar.com/avatar/e1d6b7d0f096cb8cf44273fa33fb05f0.jpg?s=120&d=mm&r=g)
Hi Hans, the font cache currently drops non-ascii bytes when creating file names by means of containers.cleanname(). Dohyun Kim sent a fix for data-con.lua (see below). My own test with the unicode library leads to some odd results. Also I noticed that as a pattern, [^%w%d] is a bit redundant since %d is a subset of %w in both string and unicode.utf8. Regards Philipp #!/usr/bin/env texlua local non_ascii_names = { [[华文仿宋.ttf]], [[华文细黑.ttf]], [[华文黑体.ttf]], } --- [a]: current data-con --- [b]: include non-ascii (proposed by Dohyun Kim) --- [c]: with selene unicode for i = 1, #non_ascii_names do local name = non_ascii_names[i] print"" print("[a]", name, string.gsub(string.lower(name), "[^%w%d]+","-")) print("[b]", name, string.gsub(string.lower(name), "[^%w%d\128-\255]+","-")) print("[c]", name, unicode.utf8.gsub(unicode.utf8.lower(name), "[^%w%d]+","-")) end
![](https://secure.gravatar.com/avatar/49e63acb01f2ca80efce7eed08310ce8.jpg?s=120&d=mm&r=g)
On 4/28/2013 12:04 PM, Philipp Gesang wrote:
the font cache currently drops non-ascii bytes when creating file names by means of containers.cleanname(). Dohyun Kim sent a fix for data-con.lua (see below). My own test with the unicode library leads to some odd results.
strange that it wasn't noticed before as it's rather old code function containers.cleanname(name) return (gsub(lower(name),"[^%w\128-\255]+","-")) end is good enough i guess Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
![](https://secure.gravatar.com/avatar/e1d6b7d0f096cb8cf44273fa33fb05f0.jpg?s=120&d=mm&r=g)
···
On 4/28/2013 12:04 PM, Philipp Gesang wrote:
the font cache currently drops non-ascii bytes when creating file names by means of containers.cleanname(). Dohyun Kim sent a fix for data-con.lua (see below). My own test with the unicode library leads to some odd results.
strange that it wasn't noticed before as it's rather old code
Personally I would rename the files instead of reporting it.
function containers.cleanname(name) return (gsub(lower(name),"[^%w\128-\255]+","-")) end
is good enough i guess
Of course, thanks! Philipp
![](https://secure.gravatar.com/avatar/7325514c8a7b377b86d86ad29d35e8ef.jpg?s=120&d=mm&r=g)
On Sun, Apr 28, 2013 at 12:56:25PM +0200, Hans Hagen wrote:
On 4/28/2013 12:04 PM, Philipp Gesang wrote:
the font cache currently drops non-ascii bytes when creating file names by means of containers.cleanname(). Dohyun Kim sent a fix for data-con.lua (see below). My own test with the unicode library leads to some odd results.
strange that it wasn't noticed before as it's rather old code
I noticed it long ago (by reading the code), but since I didn't have any fonts with non-ASCII filenames, I didn't bother. Regards, Khaled
![](https://secure.gravatar.com/avatar/b79687264c3e06cba13b54d05b12a9f5.jpg?s=120&d=mm&r=g)
Am 28.04.2013 um 14:08 schrieb Khaled Hosny
On Sun, Apr 28, 2013 at 12:56:25PM +0200, Hans Hagen wrote:
On 4/28/2013 12:04 PM, Philipp Gesang wrote:
the font cache currently drops non-ascii bytes when creating file names by means of containers.cleanname(). Dohyun Kim sent a fix for data-con.lua (see below). My own test with the unicode library leads to some odd results.
strange that it wasn't noticed before as it's rather old code
I noticed it long ago (by reading the code), but since I didn't have any fonts with non-ASCII filenames, I didn't bother.
IIRC this was on purpose because there had been problems when fonts used non-ascii characters. Wolfgang
![](https://secure.gravatar.com/avatar/49e63acb01f2ca80efce7eed08310ce8.jpg?s=120&d=mm&r=g)
On 4/28/2013 2:15 PM, Wolfgang Schuster wrote:
Am 28.04.2013 um 14:08 schrieb Khaled Hosny
: On Sun, Apr 28, 2013 at 12:56:25PM +0200, Hans Hagen wrote:
On 4/28/2013 12:04 PM, Philipp Gesang wrote:
the font cache currently drops non-ascii bytes when creating file names by means of containers.cleanname(). Dohyun Kim sent a fix for data-con.lua (see below). My own test with the unicode library leads to some odd results.
strange that it wasn't noticed before as it's rather old code
I noticed it long ago (by reading the code), but since I didn't have any fonts with non-ASCII filenames, I didn't bother.
IIRC this was on purpose because there had been problems when fonts used non-ascii characters.
indeed, and as this patch only involves caching it means that the problem moves elsewhere Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (5)
-
Hans Hagen
-
Khaled Hosny
-
Philipp Gesang
-
Philipp Gesang
-
Wolfgang Schuster