Strings in LuaTeX's pdfe

3 Apr 2020

      Hi,

I recently tried to do something with the embedded pdfe library and
noticed that accessing strings comes with certain problems. PDF strings
are always returned in raw form without the surrounding <> or (), so any
script using them will need to know if it is a hex string or a "normal"
() delimited string in order to treat it correctly. So pdfe.getstring is
a bit weird: It gives a Lua string but no indication which type of
string is returned. So if pdfe.getstring e.g. returns "425", it can be
either correspond to the actual text "425" or it can be the hexadecimal
encoding of "BP". Given that PDF allows beoth upper and lowercase
letters and even an odd number of digits in a hexadecimal string, even
guessing the right format is hard and error-prone, making pdf.getstring
not particularly useful. The same issue appears with the `__index`
metafunctions of dictionaries and arrays. This is especially weird
because it's inconsistent with PDF names which always get decoded before
they are passed to the user.

Also even after the Lua script figures out if it is a hex string or a
literal string, it has to decode it. (Of course this part only applies
if the actual value is needed and not if it only should be passed into
another PDF string) That's not complicated, but it feels weird: After
all, the underlying pplib already decoded the string, so it seems like
it would be easier to make this decoded version accessible to the user.

So would it be possible to maybe either change the existing functions or
add new ones to

  1. return the already decoded value and/or
  2. give an indication if a literal or a hex string is returned?

Best regards,
Marcel

Marcel Fabian Krüger

Ulrike Fischer

Hans Hagen

luigi scarso

Hans Hagen

Ulrike Fischer

Marcel Fabian Krüger

luigi scarso

tags

participants (4)