Hi, can one of the experienced lua programmers here think about a special version of format()? This function in general is of great help (short, good readable code), but in combination with direct written pdf code (like in mlib-pdf.lua) is has a major drawback. All numbers are written with fixed size (as normally intended in formatted output). As a result you get '0.000000' instead of a short '0'. Look in an uncompressed pdf and you will see how much space is wasted there in total. Here an example (pdf code segment of a mp graphic). % mps graphic 1: begin q q /Pattern cs 10.000000 M 1 j 0.000000 0.000000 m 94.486725 0.000000 188.977783 0.000000 283.464508 0.000000 c 283.464508 28.346451 l 188.977783 28.346451 94.486725 28.346451 0.000000 28.346451 c 0.000000 0.000000 l W n /MpSh1 sh Q 0 g 0 G 1 J 0.100006 w 0.000000 28.346451 m 283.464508 28.346451 l S 0 g 0 G 0 g 0 G 0.100006 w 283.096863 28.194168 m 283.464508 28.346451 l 283.096863 28.498734 l 283.096863 28.194168 l h B 0 g 0 G Q % mps graphic 1: end With enabled compression things don't look that bad (regarding to file size), but it's still a unnecessary waste. To summarize it: the new function should act like 'format', but it should use the minimal number of digits. Also there should be the possibility to adjust the number of digits after the point (current accuracy for floats: 1/1000000 pt). Any thoughts? Regards, Peter
Hello, For a string with a single float you can play a bit with the output of string.format. The following few lines strip all the trailing zeroes from strings that look a formatted float: ==== function string.optimize_format(form, ...) local formatted_string = string.format(form, ...) local optimized_string = formatted_string:gsub('^(%d*\.%d-)(0*)$', '%1') return optimized_string end print(string.optimize_format(0.003)) print(string.optimize_format(0.000007)) print(string.optimize_format(120)) print(string.optimize_format("Hello, world!")) ==== but of course this doesn't work if you have a format like "%s %s l"; you would have to parse all the arguments to format. Arthur
local optimized_string = formatted_string:gsub('^(%d*\.%d-)(0*)$', '%1')
Actually, I just realized that the primitive tostring does exactly the same transformation (probably in a much safer way), so we can use it in the much more complete code: ==== -- Thank you, Lua-users wiki :-) -- http://lua-users.org/wiki/VarargTheSecondClassCitizen Issue #7 -- The first two functions below implement a map over a "..." list local function map_rec(f, n, a, ...) if n > 0 then return f(a), map_rec(f, n-1, ...) end end local function map(f, ...) return map_rec(f, select('#', ...), ...) end -- Our "better formatting" function function string.better_format(form, ...) -- Every number-kind-of-format -> %s form = form:gsub('%%[0-9]-%.?[0-9]-f', '%%s') form = form:gsub('%%d', '%%s') form = form:gsub('%%i', '%%s') -- Then make every argument in the list into a string return(string.format(form, map(tostring, ...))) end -- Now Lua will happily strip the trailing zeroes print(string.better_format("%f %f m", 94.486000, 30.000)) print(string.better_format("%5f %.7f l", .577, 2.7828)) print(string.better_format("%5f %.7f %d l", 3.14000, 2.7828, 3)) ==== Arthur
Arthur Reutenauer wrote:
local optimized_string = formatted_string:gsub('^(%d*\.%d-)(0*)$', '%1')
Actually, I just realized that the primitive tostring does exactly the same transformation (probably in a much safer way), so we can use it in the much more complete code:
====
-- Thank you, Lua-users wiki :-) -- http://lua-users.org/wiki/VarargTheSecondClassCitizen Issue #7
-- The first two functions below implement a map over a "..." list local function map_rec(f, n, a, ...) if n > 0 then return f(a), map_rec(f, n-1, ...) end end
local function map(f, ...) return map_rec(f, select('#', ...), ...) end
-- Our "better formatting" function function string.better_format(form, ...) -- Every number-kind-of-format -> %s form = form:gsub('%%[0-9]-%.?[0-9]-f', '%%s') form = form:gsub('%%d', '%%s') form = form:gsub('%%i', '%%s') -- Then make every argument in the list into a string return(string.format(form, map(tostring, ...))) end
-- Now Lua will happily strip the trailing zeroes print(string.better_format("%f %f m", 94.486000, 30.000)) print(string.better_format("%5f %.7f l", .577, 2.7828)) print(string.better_format("%5f %.7f %d l", 3.14000, 2.7828, 3))
this is a bit weird case ... on the one hand you specify %5f and such but that's ignored i.e. becomes %s so why not use %s in the first place then so, you could have said print(string.format("%s %s m", 94.486000, 30.000)) print(string.format("%s %s l", .577, 2.7828)) print(string.format("%s %s %s l", 3.14000, 2.7828, 3)) which is then way faster anyway, originally i used %s but when taco and i played with the converter and did some performance tests we found out that %f is faster (unless > 6 digits specified) a more general speed improvement is to set the pdf compression to 3 i may add a stripper (more general one) once i reimplement the backend (all backend stuff in mkiv is temporary) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
this is a bit weird case ... on the one hand you specify %5f and such but that's ignored i.e. becomes %s so why not use %s in the first place then
Actually, this was precisely my intention: to show that one could use the original format string with different formatting requirements (I don't know where that format string is going to come from, if Peter is going to type or if it's going to be inserted by some lower-level function). But I agree that if the user is going to type it himself, he'd better use %s for convenience (Lua is so tolerant anyway).
anyway, originally i used %s but when taco and i played with the converter and did some performance tests we found out that %f is faster (unless > 6 digits specified)
I suppose that when Lua sees a %s format with an argument that is not a string, he calls tostring which must be much slower. Arthur
Arthur Reutenauer wrote:
this is a bit weird case ... on the one hand you specify %5f and such but that's ignored i.e. becomes %s so why not use %s in the first place then
Actually, this was precisely my intention: to show that one could use the original format string with different formatting requirements (I don't know where that format string is going to come from, if Peter is going to type or if it's going to be inserted by some lower-level function). But I agree that if the user is going to type it himself, he'd better use %s for convenience (Lua is so tolerant anyway).
sure; in this case the %f is deep down in the mkiv code
anyway, originally i used %s but when taco and i played with the converter and did some performance tests we found out that %f is faster (unless > 6 digits specified)
I suppose that when Lua sees a %s format with an argument that is not a string, he calls tostring which must be much slower.
i think that for any conversion it has to call a function (take hex) but tostring may be costly due to metatable access, also it seems that tostring does some stripping Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Arthur Reutenauer wrote:
local optimized_string = formatted_string:gsub('^(%d*\.%d-)(0*)$', '%1')
okay then, let's go educational ... a better stripper: local digit = lpeg.R("09") local period = lpeg.P(".") local zero = lpeg.P("0") local finish = lpeg.P(-1) local nodigit = (1-digit) + finish local number = digit^1 * ((period * zero^1 * #nodigit)/"") + (period * (1-zero)^0 * (zero^1/"" + digit^1) * nodigit) local stripper = lpeg.Cs((number + 1)^0) local sample = "bla bla 0.11100000 bla bla 0.00000 bla 0.00001 bla bla bla 10.11100000 bla bla 1.00000 bla 0.00001 bla" collectgarbage("collect") str = string.rep(sample,10000) local ts = os.clock() stripper:match(str) print(#str, os.clock()-ts, stripper:match(sample)) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Peter Rolf wrote:
Hi,
can one of the experienced lua programmers here think about a special version of format()? This function in general is of great help (short, good readable code), but in combination with direct written pdf code (like in mlib-pdf.lua) is has a major drawback. All numbers are written with fixed size (as normally intended in formatted output). As a result you get '0.000000' instead of a short '0'. Look in an uncompressed pdf and you will see how much space is wasted there in total.
it's a trade off; %g is not usable because we then get e notation and postprocessing is no option either because it's a waste of cpu cycles
With enabled compression things don't look that bad (regarding to file size), but it's still a unnecessary waste.
indeed, such sequences compress well adding special formatter function has a low priority .. maybe some day Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen schrieb:
Peter Rolf wrote:
Hi,
can one of the experienced lua programmers here think about a special version of format()? This function in general is of great help (short, good readable code), but in combination with direct written pdf code (like in mlib-pdf.lua) is has a major drawback. All numbers are written with fixed size (as normally intended in formatted output). As a result you get '0.000000' instead of a short '0'. Look in an uncompressed pdf and you will see how much space is wasted there in total.
it's a trade off; %g is not usable because we then get e notation
and postprocessing is no option either because it's a waste of cpu cycles
stupid me (a much simpler approach is possible). just looked deeper in mlib-pdf.lua. i simply added '%.3f' on all positions, where coordinates/measures are written. same for integer ('%.0f' or as in the pdf reference '%.1f'). much shorter and better readability of the unpacked pdf. so there is no really need for an *optimal* solution with an adapted format function. sorry for the noise.
With enabled compression things don't look that bad (regarding to file size), but it's still a unnecessary waste.
indeed, such sequences compress well
yes <sigh>. i optimized some code and saved around 18k in the final uncompressed pdf (3% smaller). took me some time and i was a little proud of myself. after compressing all saving that was left were 1.111 bytes. <sigh again> :) but this is not only a file size issue. you have to represent the data in some way in memory. less memory usage, less time for data scanning means faster viewing.
adding special formatter function has a low priority .. maybe some day
Hans
so can you please simply limit the number of digits after the point in mlib-pdf.lua? you have already done this for colors at the end of the source. if i have to patch one more file, i can make my own distribution ;) regards, peter
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Peter Rolf wrote:
just looked deeper in mlib-pdf.lua. i simply added '%.3f' on all positions, where coordinates/measures are written. same for integer ('%.0f' or as in the pdf reference '%.1f'). much shorter and better
integers are %i and floats %f (which defaults to %.6f) we really need at least 6 digits precision with graphics; for colors 3 digits is ok
readability of the unpacked pdf. so there is no really need for an *optimal* solution with an adapted format function. sorry for the noise.
With enabled compression things don't look that bad (regarding to file size), but it's still a unnecessary waste. indeed, such sequences compress well
yes <sigh>. i optimized some code and saved around 18k in the final uncompressed pdf (3% smaller). took me some time and i was a little proud of myself. after compressing all saving that was left were 1.111 bytes. <sigh again> :)
so your document is 99% graphics then? actually there are a few other optimizations i want to do (cm and such) but this is also related to literal processing in general (i need that for runtime generated fonts because (esp when we randomize too) one easilly get megs of inline fontdata
but this is not only a file size issue. you have to represent the data in some way in memory. less memory usage, less time for data scanning means faster viewing.
neglectable i guess, dealing with color spaces and such takes way more runtime
adding special formatter function has a low priority .. maybe some day
Hans
so can you please simply limit the number of digits after the point in mlib-pdf.lua? you have already done this for colors at the end of the source. if i have to patch one more file, i can make my own distribution ;)
i'm not going to change this so i fear that you will end up with your own low-res version Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote: a better one local digit = lpeg.R("09") local period = lpeg.P(".") local zero = lpeg.P("0") local nozero = 1 - zero local finish = lpeg.P(-1) local nodigit = (1-digit) + finish local case_1 = (period * zero^1 * #nodigit)/"" -- .000 local case_2 = (period * (1-(zero^0/"") * #nodigit)^1 * (zero^0/"") * nodigit) -- .010 .10 .100100 local number = digit^1 * (case_1 + case_2) local stripper = lpeg.Cs((number + 1)^0) function aux.strip_zeros(str) return stripper:match(str) end ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen schrieb:
Peter Rolf wrote:
just looked deeper in mlib-pdf.lua. i simply added '%.3f' on all positions, where coordinates/measures are written. same for integer ('%.0f' or as in the pdf reference '%.1f'). much shorter and better
integers are %i and floats %f (which defaults to %.6f)
we really need at least 6 digits precision with graphics; for colors 3 digits is ok
you really need a precision of a millonth part of a point? or do you need it in combination with other units (cm,m)? me no pdf expert. anyhow, my graphics look similar with only three points after the digit. saves me another 5% in the compressed pdf.
readability of the unpacked pdf. so there is no really need for an *optimal* solution with an adapted format function. sorry for the noise.
With enabled compression things don't look that bad (regarding to file size), but it's still a unnecessary waste. indeed, such sequences compress well
yes <sigh>. i optimized some code and saved around 18k in the final uncompressed pdf (3% smaller). took me some time and i was a little proud of myself. after compressing all saving that was left were 1.111 bytes. <sigh again> :)
so your document is 99% graphics then?
yes. only some text in the graphic. people don't like to read much text in a gui :)
actually there are a few other optimizations i want to do (cm and such) but this is also related to literal processing in general (i need that for runtime generated fonts because (esp when we randomize too) one easilly get megs of inline fontdata
but this is not only a file size issue. you have to represent the data in some way in memory. less memory usage, less time for data scanning means faster viewing.
neglectable i guess, dealing with color spaces and such takes way more runtime
adding special formatter function has a low priority .. maybe some day
Hans
so can you please simply limit the number of digits after the point in mlib-pdf.lua? you have already done this for colors at the end of the source. if i have to patch one more file, i can make my own distribution ;)
i'm not going to change this so i fear that you will end up with your own low-res version
ok, it can't be helped. so it's one more file to patch. anyhow thanks to you and Arthur for your answers. i'm still trying to understand lpeg (once started, but never used it), so this is quite interesting stuff. regards, peter
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Peter Rolf wrote:
you really need a precision of a millonth part of a point? or do you need it in combination with other units (cm,m)? me no pdf expert. anyhow, my graphics look similar with only three points after the digit. saves me another 5% in the compressed pdf.
it depends a bit, but if you make little graphics and do a lot of scaling; long ago i played with precision (you can set this in pdftex too \pdfdecimaldigits=5 by default) there's nothing as annoying as things just not fitting (enough apps out there that are inaccurate) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Arthur Reutenauer
-
Hans Hagen
-
Peter Rolf