Hi, I'm currently trying to insert XMP metadata into a PDF file in order to make it PDF/A compliant. The XML stuff is a Lua string and I'm using pdf.obj() in order to create the object. However, leading whitespace is renoved by pdf.obj(). As far as the XML stuff is concerned, it's not a big problem if indentation gets lost because XML parsers ignore it anyway. However, I have to add about 2..4 kB of padding and all validators insist on space characters (0x20). This padding is important because it allows content management systems to extend the XML stuff without the need to re-generate the xref table. After all, I would expect that pdf.obj() allows to insert even binary stuff because PDF objects can contain anything. Hence I assume that the current behavior is not desired. I didn't try pdf.immediateobj() yet but I fear that it's affected too. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------
On 3/8/2014 1:46 AM, Reinhard Kotucha wrote:
Hi, I'm currently trying to insert XMP metadata into a PDF file in order to make it PDF/A compliant.
The XML stuff is a Lua string and I'm using pdf.obj() in order to create the object. However, leading whitespace is renoved by pdf.obj(). As far as the XML stuff is concerned, it's not a big problem if indentation gets lost because XML parsers ignore it anyway.
However, I have to add about 2..4 kB of padding and all validators insist on space characters (0x20). This padding is important because it allows content management systems to extend the XML stuff without the need to re-generate the xref table.
After all, I would expect that pdf.obj() allows to insert even binary stuff because PDF objects can contain anything. Hence I assume that the current behavior is not desired.
I didn't try pdf.immediateobj() yet but I fear that it's affected too.
this works quite ok local s = [[ <foo> <bar> some crap </bar> </foo> ]] pdf.refobj(pdf.obj("stream",s)) pdf.immediateobj(s) of course if you embed that kind of lua code in a tex file then you have to make sure that you use a catcode regime that retains spaces .. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 2014-03-08 at 12:45:07 +0100, Hans Hagen wrote:
On 3/8/2014 1:46 AM, Reinhard Kotucha wrote:
Hi, I'm currently trying to insert XMP metadata into a PDF file in order to make it PDF/A compliant.
The XML stuff is a Lua string and I'm using pdf.obj() in order to create the object. However, leading whitespace is renoved by pdf.obj(). As far as the XML stuff is concerned, it's not a big problem if indentation gets lost because XML parsers ignore it anyway.
However, I have to add about 2..4 kB of padding and all validators insist on space characters (0x20). This padding is important because it allows content management systems to extend the XML stuff without the need to re-generate the xref table.
After all, I would expect that pdf.obj() allows to insert even binary stuff because PDF objects can contain anything. Hence I assume that the current behavior is not desired.
I didn't try pdf.immediateobj() yet but I fear that it's affected too.
this works quite ok
local s = [[
<foo> <bar> some crap </bar> </foo>
]]
pdf.refobj(pdf.obj("stream",s)) pdf.immediateobj(s)
of course if you embed that kind of lua code in a tex file then you have to make sure that you use a catcode regime that retains spaces ..
Hi Hans, thank you for the response. I'm still confused. I didn't expect that I have to care about catcodes when I use pure lua functions. But there are some oddities. First of all, in my actual project I'm using pdf.immediateobj() and it seems to work there. But I don't know why it works there but not in my MWE (see attachment). Of course, I could continue but I hesitate to add more complexity until I completely grok what's going on. I fear that I break it when I change something anywhere else. What also confuses me is that in the MWE spaces are preserved unless they appear at the beginnig of a line. I.e., the line <!-- --> appears as <!-- --> in the PDF file. I would expect that with \catcode32=10 all spaces are collpsed and leading spaces are ignored but with \catcode32=12 all spaces are pertained. If you try the MWE on Linux, just run the makefile. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------
On 3/9/2014 12:30 AM, Reinhard Kotucha wrote:
On 2014-03-08 at 12:45:07 +0100, Hans Hagen wrote:
On 3/8/2014 1:46 AM, Reinhard Kotucha wrote:
Hi, I'm currently trying to insert XMP metadata into a PDF file in order to make it PDF/A compliant.
The XML stuff is a Lua string and I'm using pdf.obj() in order to create the object. However, leading whitespace is renoved by pdf.obj(). As far as the XML stuff is concerned, it's not a big problem if indentation gets lost because XML parsers ignore it anyway.
However, I have to add about 2..4 kB of padding and all validators insist on space characters (0x20). This padding is important because it allows content management systems to extend the XML stuff without the need to re-generate the xref table.
After all, I would expect that pdf.obj() allows to insert even binary stuff because PDF objects can contain anything. Hence I assume that the current behavior is not desired.
I didn't try pdf.immediateobj() yet but I fear that it's affected too.
this works quite ok
local s = [[
<foo> <bar> some crap </bar> </foo>
]]
pdf.refobj(pdf.obj("stream",s)) pdf.immediateobj(s)
of course if you embed that kind of lua code in a tex file then you have to make sure that you use a catcode regime that retains spaces ..
Hi Hans, thank you for the response. I'm still confused. I didn't expect that I have to care about catcodes when I use pure lua functions.
But there are some oddities. First of all, in my actual project I'm using pdf.immediateobj() and it seems to work there. But I don't know why it works there but not in my MWE (see attachment).
Of course, I could continue but I hesitate to add more complexity until I completely grok what's going on. I fear that I break it when I change something anywhere else.
What also confuses me is that in the MWE spaces are preserved unless they appear at the beginnig of a line. I.e., the line
<!-- -->
appears as
<!-- -->
in the PDF file. I would expect that with \catcode32=10 all spaces are collpsed and leading spaces are ignored but with \catcode32=12 all spaces are pertained.
If you try the MWE on Linux, just run the makefile.
As we're supposedly talking latex here it's a bit off topic for this list but anyway ... I can't run your example as i have a context-only setup here, but looking at your code: \input{xmp.lua} \bgroup\catcode32=12 \luadirect{XMPobj("\pdfcreationdate")} \egroup that wouldn't help much. What I meant with catcodes is that when you *load* the lua code inside tex it matters what the catcodes are, so, it should be \bgroup\catcode32=12 \input{xmp.lua} \luadirect{XMPobj("\pdfcreationdate")} \egroup But then the next question is: what does \begin{luacode*} do with catcodes; I'm sure you cannot change that without also messing with other usage. The easiest solution is to just load the file using lua's "dofile", so something: \luadirect{dofile("xml.lua")} Hans Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 2014-03-09 at 18:52:25 +0100, Hans Hagen wrote:
As we're supposedly talking latex here it's a bit off topic for this list but anyway ...
Hi Hans, first of all, thank you very much. It's much clearer now. Sorry for using LaTeX instead of plain TeX or ConTeXt, but it turned out that then your answer would be less helpful. ;) luacode.sty sets \catcode32=10 at the beginning indeed. Hence no chance to change it outside the luacode environment.
The easiest solution is to just load the file using lua's "dofile", so something:
\luadirect{dofile("xml.lua")}
This works fine in my MWE. I think that groked now why leading spaces are pertained in my project but not in the MWE. The MWE is a TeX file which loads xml.lua wrapped into a luacode environment loaded with \input. My project is much more complex. The main Lua file is loaded the same way as in the MWE. But it loads a bunch of other Lua files (also xml.lua) with require(). It seems that with require() has the same effect as dofile(). After all, you helped me a lot and I think that I can continue now with what I have without the fear that I break everything when I change something anywhere else. Thanks a lot, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------
participants (2)
-
Hans Hagen
-
Reinhard Kotucha