[NTG-pdftex] \pdfmatch (was: Patch EscapeAndOther)

Martin Schröder martin at oneiros.de
Thu Jun 30 12:06:32 CEST 2005


On 2005-06-25 03:52:01 +0200, Heiko Oberdiek wrote:
> \pdfmatch [icase] [subcount <number>}] {<pattern>}{<string>}
>   Implements pattern matching using the POSIX regex
>   (a standard library at least in my linux).
>   It returns the same values as \pdfstrcmp, but
>   with the following semantics:
>     -1: error case (invalid pattern, ...)
>      0: no match
>      1: match found
>   Options:
>   * icase: case insensitive matching
>   * subcount: it sets the table size for found subpatterns.
>     A number "-1" resets the table size to the start default.
> 
>   See the manual page regex.3 and regex.7.
> 
>   The implementation shows a possible interface to
>   pattern matching in TeX. Therefore only the basics
>   is implemented.
>   Flags:
>   * REG_EXTENDED is set in the implementation.
>   * REG_ICASE: can be set by user.
>   * other: not implemented.
> 
> \pdflastmatch <number>
>   The result of \pdfmatch is stored in an array.
>   The entry "0" contains the match, the following
>   entries submatches. The positions of the matches
>   are also available. They are encoded in the following
>   manner to avoid another primitive:
>     <position> "->" <match string>
>   "->" is used as separator in the tradition of \meaning.
>   There exists macros for parsing the output of \meaning
>   (e.g. in LaTeX: \strip at prefix).
>   The position "-1" with an empty string indicates that
>   this entry is not set.
>   Example:
>     \def\msg#{\immediate\write16 }
>     \msg{\pdfmatch{(l+)o (W(o))}{Hello World}}
>     \msg{\pdflastmatch0}
>     \msg{\pdflastmatch1}
>     \msg{\pdflastmatch2}
>     \msg{\pdflastmatch3}
>     \msg{\pdflastmatch4}
>   Result:
>     1
>     2->llo Wo
>     2->ll
>     6->Wo
>     7->o
>     -1->
> 
> Alternative:
>   PCRE (Perl-compatible regular expressions) is far more
>   powerful. More options, named subpattern, ...
>   License for 0.4 was GPL compatible, since 0.5 it is BSD,
>   current version is 0.6.
> 
>   The TeX interface could be changed in the following way:
>   * Addition: \pdflastmatchbyname <general text>
>     It extracts matches for named subpattern.
>   * Options can be given by the same name as in the
>     PCRE description:
>     \pdfmatch anchored caseless ... {}{}
>     For easier/faster scanning the options could be
>     restricted to be given in sorted order.
>   * Or options can be given by letters in any order
>     in an additional argument:
>       \pdfmatch{<pattern>}{<options>}{<string>}
>       \pdfmatch{l+}{ai}{Hello World}
>     The implementation could then use strchr to check,
>     whether an option is set.
> 
> Patch instructions for testing are given in the
> patch description at sarovar.

While this is a VERY nice feature, I'm reluctant to include this
into 1.30.0 because
- we are (in theory at least) in feature-freeze, and this is
  definitely a new feature :-)
- it may need more testing
- I doubt that regex.h is portable; we should keep Windows in
  mind.

Comments?

Best regards
    Martin
-- 
                    http://www.tm.oneiros.de


More information about the ntg-pdftex mailing list