[NTG-context] Multiple cases of unexpected behaviour in luametatex

Marcel Fabian Kr├╝ger tex at 2krueger.de
Fri Jul 3 14:55:45 CEST 2020


Hi,

I recently noticed some cases where luametatex behaved in unexpected
ways:

  - The "Extra \fi" error isn't triggered, instead an extra `\fi`
    freezes luametatex. (Can be reproduced by compiling a document which
    only consists of a single \fi)

  - token.new can only create some `data` tokens, but it doesn't apply
    bound checking on it's arguments:

    Take

    ```
    \directlua{
      t = token.new(0x200000, token.command_id'data')
      print(t.cmdname, t.command, t.mode)
    }
    ```

    which prints

    register	102	0

    The issue does not seem to be that such tokens do not exists because

    \letdatacode\somedata="200000
    \directlua{
      local t = token.create'somedata'
      print(t.cmdname, t.command, t.index)
    }
    
    does print

    data	101	2097152

    Also for all other commands LuaTeX seems to apply range-checks to
    ensure that such overflows don't happen, even if invalid values are
    passed as firstargument.

  - There is token.primitives(). My assumption is that the returned
    table is meant to indicate the command is, mode and name
    corresponding to every primitive. (I think it is awesome that such a
    table is made available in luametatex) But especially the mode
    field sometimes has values which do not correspond to the mode of
    the actual primitives:

    I tried running the following in an almost iniTeX setting where all
    primitives aside from \shipout and \Umathcodenum have their default
    definitions:

    ```
    \catcode`\%=12
    \catcode`\~=12
    \directlua{
      local sorted = token.primitives()
      table.sort(sorted, function(a,b) return a[1]<b[1] or a[1]==b[1] and a[2]<b[2]end)
      for _,info in ipairs(sorted) do
        local t = token.create(info[3])
        local rc, rm = t.command, t.mode
        if rc==info[1] and rm ~= info[2] then
          if info[2] == 0 then
            print(string.format('MODE MISMATCH, expected zero: \string\\%s: real: %i, command: %i', info[3], rm, rc))
          else
            print(string.format('MODE MISMATCH: \string\\%s: offset: %i, command: %i', info[3], rm-info[2], rc))
          end
        elseif rc~=info[1] then print(t.csname)
        end
      end
    }
    ```

    This indicates that there are two kinds of differences:
    For some command codes, there are multiple primitives whose second
    entry in the token.primitives table is zero even though their mode
    is not zero. This especially affects the commands `above`,
    `after_something`, `make_box`, `un_vbox`, `set_specification` and
    `car_ret`.
    E.g. for after_something, all of \atendofgrouped, \afterassigned and
    \aftergrouped have a zero as second entry in token.primitives.

    The other difference is that all the internal_... commands have a
    fixed offset which differes between commands in their mode field.

    IMO the difference for the internal_... commands make sense because
    they make for easier to use numbers, but having multiple primitives
    indicating mode 0 for the other commands seems to make this table
    significantly less useful because it can't be used to get a unique
    description of a primitive.

    (I may have completely misinterpreted the table of course, but given
    that for other primitives the values match I do not think so)

Best regards,
Marcel


More information about the ntg-context mailing list