Hello,
I have uploaded the patch EscapeAndOther, #375.
The patch is based on pdftex-1.30.0-rc1.
It supersedes patch EscapeDivers.tar.bz2, #371.
Inherited from patch EscapeDivers:
Expandable: \pdfstrcmp, \pdfescapestring, \pdfescapename
Added: \pdfescapehex, \pdfunescapehex
Fix: \pdfescapename: '%' is also a delimiter that
needs escaping.
New in patch EscapeAndOther:
* \pdfcreationdate
* \pdffilemoddate
* \pdffilesize
* \pdfmdfivesum
* \pdffiledump
* \pdfshellescape
* \pdfmatch and \pdflastmatch
* Start with bug fix: quotes are legal in other operating
systems than windows.
Syntax:
%% expandable commands:
\pdfstrcmp <general text> <general text>
\pdfescapestring <general text>
\pdfescapename <general text>
\pdfescapehex <general text>
\pdfunescapehex <general text>
\pdfcreationdate
\pdffilemoddate <general text>
\pdffilesize <general text>
\pdfmdfivesum <file spec> | <general text>
\pdfmatch <match options> <general text> <general text>
<match options> := [icase] [subcount <number>]
\pdflastmatch <number>
%% read-only integers
\pdfshellescape
Common for the following primitives:
* The result string is given by characters with
catcode "other" (12), only the space has
catcode "space" (10). The commands follow
the tradition of \meaning, \string, ...
* The argument <general text> is expanded before use.
It follows the tradition of \special, \message,
\pdfobj, ...
Description:
\pdfstrcmp{<a>}{<b>}
Compares two strings and returns the strings
"0" if <a> equals <b>
"-1" if <a> is less than <b>
"1" if <a> is greater than <b>
Use example:
\ifcase\pdfstrcmp{abc}{def}\relax
\message{abc = def}
\or
\message{abc > def}
\else
\message{abc < def}
\fi
Alternative:
Implementing as read-only integer, then
the use in \ifcase, \ifnum, ... is more save:
\ifcase\pdfstrcmp{xyz}{abc}1\or 2\else 3\fi
expands to 3, as read-only integer it expands
to the expected 2.
\pdfescapestring{<a>}
Escapes the string <a> that it can be used as PDF string.
'(', ')', '\' are escaped along with the
control and 8-bit characters.
Use example:
\pdfinfo{/Title (\pdfescapestring{...})}
\special{ps: [ /Title (\pdfescapestring{...}) /DOCINFO pdfmark}
Alternative:
Perhaps 8-bit characters don't need escaping.
Whitespace, especially newlines should be escaped, because
of the use for latex/dvips to avoid recoding problems
(<LF> -> <CR><LF>).
\pdfescapename{<a>}
Escapes the string <a> that it can be used as PDF string.
Whitespace, delimiters, '#' are escaped along with
the control and 8-bit characters, recommended by the spec.
Use example:
\pdfobj stream attr{/Type/EmbeddedFile%
/Subtype/\pdfescapename{text/plain; charset=iso-8859-1}%
...
} file {...}
\pdfescapehex{<a>}, \pdfunescapehex{<b>}
String <a> is converted to uppercase hexadecimal
representation, <b> is converted back.
Use example:
\pdfescapehex{Hello} is converted to 48656C6C6F
\pdfinfo{/Title <\pdfescapehex{Hello}>}
Also it can be used to write strings in auxiliary
files and later reread without worrying about
catcodes, unmached curly braces.
\pdfcreationdate
It expands to the date string that pdfTeX uses in
the info dict as default.
Rationale:
* It provides seconds and especially the time zone.
* Setting of /M date for annotations.
* Because of the complicate change file structure
of the sources it is not easy to synchronize
the creation date with \year, \month, \day
and \time. Thus \pdfcreationdate can be
used to set these registers to the same
values.
Example:
\pdfcreationdate expands to D:20050625015605+02'00'
\pdfannot ...{/Subtype /FileAttachment
\M (\pdfcreationdate) ...}
\pdffilemoddate{<file>}
It expands to the modification date of <file> in the
same format as \pdfcreationdate (PDF date format).
On error it returns the empty string.
Rationale:
* File embedding: the date is shown in the attachment tab.
* "Make feature": files can be compared, it can be checked,
which file is newer.
Example: pdfTeX does not support EPS files, epstopdf.sty
converts them to PDF either always or, if the PDF variants
do not exists. Both ways are not satifactory, either the
time penalty is large or pdf files are embedded that
are out of date. \pdffilemoddate solves this problem.
Error handling, see \pdffilesize
I don't have implemented a \pdfcreationdate because of
portability issue: The "ctime" field of struct "stat" is
interpreted differently among operating systems:
* creation date, e.g. win
* inode change time, e.g. unix
\pdffilesize{<file>}
It expands to the size of <file> as string. On error it
returns the empty string.
Rationale:
* File embedding: the size is shown in the attachment tab.
* Sometimes it is useful to know if a file has size "0"
(failed conversions, ...).
Error handling:
Currently the empty string is silently returned.
Alternatives:
* Stop with error message. But what can the user do?
Error recovery is easy: no information available,
thus return nothing.
* Warning message. But the primitives could be used
for checks on file existence.
* Return status in \pdfretval. I doubt a little whether
this is really necessary. It is very easy to implement.
But the documentation grows by a large list of
error codes with its problems: Much to explain to the
users. How they are assigned?
\pdfmdfivesum{<abc>} or \pdfmdfivesum file {<file>}
It calculates the md5 sum and converts it to
uppercase hexadecimal format (same as \pdfescapehex).
The syntax is a simplified \pdfobj: Either the
data is given directly or in a file.
Rationale:
* File embedding: providing /CheckSum.
* Also the md5 sums of auxiliary files could be stored
and compared in order to display a rerun warning.
(Of course, it can be possible that different files
have the same checksum, but the same file does not
have different checksums.)
\pdfshellescape
It is a read-only integer that is 1 if \write18 is
enabled, 0 otherwise.
Rationale:
* It thought that \ifeof18 with \pdftexversion
to implement a safe test for the \write18 feature.
But I had to learn that I was wrong, see thread
in comp.text.tex: "Confused about pstricks and
pdftricks":
mikTeX's pdfTeX does not implement \ifeof18.
For implementing the test, only a obscure way
via \pdftexbanner remains, but this way is not
very reliable, the contents of \pdftexbanner
is not well defined, it could be anything.
Quotes in file names:
* File name handling is quite chaotic. Quotes are removed
by the scanner for \input, \open*. This behaviour is
schizophrene: to solve a problem with spaces, quotes
are now forbidden.
* Inconsistencies:
The syntax of \pdfobj and \pdfximage would allow
any file name, but \pdfobj removes quotes, only
\pdfximage uses the more intelligent way, it removes
quotes for windows only.
Thus the patch fixes:
* append_to_name (used in pack_file_name): quotes are removed
for windows only. This fixes \pdfobj.
* utils.c: new function "makecfilename",
used by \pdfximage (readimage), \pdffilesize, ...
\pdfmatch [icase] [subcount <number>}] {<pattern>}{<string>}
Implements pattern matching using the POSIX regex
(a standard library at least in my linux).
It returns the same values as \pdfstrcmp, but
with the following semantics:
-1: error case (invalid pattern, ...)
0: no match
1: match found
Options:
* icase: case insensitive matching
* subcount: it sets the table size for found subpatterns.
A number "-1" resets the table size to the start default.
See the manual page regex.3 and regex.7.
The implementation shows a possible interface to
pattern matching in TeX. Therefore only the basics
is implemented.
Flags:
* REG_EXTENDED is set in the implementation.
* REG_ICASE: can be set by user.
* other: not implemented.
\pdflastmatch <number>
The result of \pdfmatch is stored in an array.
The entry "0" contains the match, the following
entries submatches. The positions of the matches
are also available. They are encoded in the following
manner to avoid another primitive:
<position> "->" <match string>
"->" is used as separator in the tradition of \meaning.
There exists macros for parsing the output of \meaning
(e.g. in LaTeX: \strip@prefix).
The position "-1" with an empty string indicates that
this entry is not set.
Example:
\def\msg#{\immediate\write16 }
\msg{\pdfmatch{(l+)o (W(o))}{Hello World}}
\msg{\pdflastmatch0}
\msg{\pdflastmatch1}
\msg{\pdflastmatch2}
\msg{\pdflastmatch3}
\msg{\pdflastmatch4}
Result:
1
2->llo Wo
2->ll
6->Wo
7->o
-1->
Alternative:
PCRE (Perl-compatible regular expressions) is far more
powerful. More options, named subpattern, ...
License for 0.4 was GPL compatible, since 0.5 it is BSD,
current version is 0.6.
The TeX interface could be changed in the following way:
* Addition: \pdflastmatchbyname <general text>
It extracts matches for named subpattern.
* Options can be given by the same name as in the
PCRE description:
\pdfmatch anchored caseless ... {}{}
For easier/faster scanning the options could be
restricted to be given in sorted order.
* Or options can be given by letters in any order
in an additional argument:
\pdfmatch{<pattern>}{<options>}{<string>}
\pdfmatch{l+}{ai}{Hello World}
The implementation could then use strchr to check,
whether an option is set.
Patch instructions for testing are given in the
patch description at sarovar.
Have fun
Heiko