Re: [Dev-luatex] Unicode in \pdfinfo

1 Jul 2008

      Jonathan Sauer wrote:
...
Hello,
...
slightly more efficient ... (char and byte accept multiple 
arguments) and you can use write which is faster too)
Thanks!
...
in context i use something quick and dirty (no > 0x10000 
checking but from your function i can deduce the magic umbers -)
Well, they are simply the ones mentioned in
http://en.wikipedia.org/wiki/Utf-16.
sure, but i didn't realize that a simple / worked ok; trunc/round stuff 
and so

anyhow, a helper function in luatex would be handy, not that this is 
such a critical issue; in a tex run hardly any utf16 conversion has to 
take place
...
...
function pdf.hexify(str)
     texwrite("feff")
     for b in str:utfvalues() do
         texwrite(("%04x"):format(b))
     end
end
two variants

function pdf.hexify(str)
     texwrite("feff" .. utf.gsub(str,".",function(c)
         local b = byte(c)
	if b < 0x10000 then
             return ("%04x"):format(b)
         else
             return ("%04x%04x"):format(b/1024+0xD800,b%1024+0xDC00)
         end
     end))
end

function pdf.hexify(str)
     texwrite("feff")
     for b in str:utfvalues() do
	if b < 0x10000 then
             texwrite(("%04x"):format(b))
         else
             texwrite(("%04x%04x"):format(b/1024+0xD800,b%1024+0xDC00))
         end
     end
end
...
...
\pdfinfo{/Title(\directlua0{pdf.hexify<'my title'>})}
so <> instead of () as string delimiter
How does that work?
in pdf traditionally a string (that is, the ones that represented 
bookmarks and such) were in pdf doc encoding, so

	(pdfdoc encoded string)

then they added utf16 support

	(utf16bom followed by utf16 sequence

that's still strings. However, at some point another notation was 
introduced:

	<hex sequence>

which again is utf16 but this time hex encoded (less efficient but so 
seldom used that it does not really matter)

from (also pdftex's) perspective, both are doable but the hex one is 
handier when tracing

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------