Jonathan Sauer wrote:
Hello,
slightly more efficient ... (char and byte accept multiple arguments) and you can use write which is faster too)
Thanks!
in context i use something quick and dirty (no > 0x10000 checking but from your function i can deduce the magic umbers -)
Well, they are simply the ones mentioned in http://en.wikipedia.org/wiki/Utf-16.
sure, but i didn't realize that a simple / worked ok; trunc/round stuff and so anyhow, a helper function in luatex would be handy, not that this is such a critical issue; in a tex run hardly any utf16 conversion has to take place
function pdf.hexify(str) texwrite("feff") for b in str:utfvalues() do texwrite(("%04x"):format(b)) end end
two variants function pdf.hexify(str) texwrite("feff" .. utf.gsub(str,".",function(c) local b = byte(c) if b < 0x10000 then return ("%04x"):format(b) else return ("%04x%04x"):format(b/1024+0xD800,b%1024+0xDC00) end end)) end function pdf.hexify(str) texwrite("feff") for b in str:utfvalues() do if b < 0x10000 then texwrite(("%04x"):format(b)) else texwrite(("%04x%04x"):format(b/1024+0xD800,b%1024+0xDC00)) end end end
\pdfinfo{/Title(\directlua0{pdf.hexify<'my title'>})}
so <> instead of () as string delimiter
How does that work?
in pdf traditionally a string (that is, the ones that represented bookmarks and such) were in pdf doc encoding, so (pdfdoc encoded string) then they added utf16 support (utf16bom followed by utf16 sequence that's still strings. However, at some point another notation was introduced: <hex sequence> which again is utf16 but this time hex encoded (less efficient but so seldom used that it does not really matter) from (also pdftex's) perspective, both are doable but the hex one is handier when tracing Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------