How to process simple HTML files with LuaTeX
Hello, I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process: <html> <head> <title>My first HTML2ConTeXt</title> </head> <body> <h1>Main Title</h1> <p>Some text ...</p> <h2>Subtitle</h2> <p>Some text again ...</p> <h1>Second title</h1> <p>... and not much more text here either ...</p> </body> </html> And the failed tries here: % engine=luatex \setupcolors[state=start] \setuphead[subject][style=bfa,color=blue] \setuphead[subsubject][style=tfa,color=blue] \starttext \xmlload{main}{test.html}{} \xmlgrab{main}{h1}{h1} \xmlgrab{main}{h2}{h2} \startxmlsetups h1 \subject{H1: #1} \stopxmlsetups \startxmlsetups h2 \subsubject{H2: #1} \stopxmlsetups How to grab only the title out of here? \xmlfilter{main}{html/head/title} \xmlflush{main} \stoptext Any hints most wellcome. Thank a lot, Mojca
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
<html> <head> <title>My first HTML2ConTeXt</title> </head> <body> <h1>Main Title</h1> <p>Some text ...</p> <h2>Subtitle</h2> <p>Some text again ...</p> <h1>Second title</h1> <p>... and not much more text here either ...</p> </body> </html>
And the failed tries here:
% engine=luatex \setupcolors[state=start] \setuphead[subject][style=bfa,color=blue] \setuphead[subsubject][style=tfa,color=blue]
\starttext \xmlload{main}{test.html}{} \xmlgrab{main}{h1}{h1} \xmlgrab{main}{h2}{h2}
\startxmlsetups h1 \subject{H1: #1} \stopxmlsetups
\startxmlsetups h2 \subsubject{H2: #1} \stopxmlsetups
How to grab only the title out of here?
\xmlfilter{main}{html/head/title}
\xmlflush{main} \stoptext
Any hints most wellcome.
keep in mind that this is still somewhat experimental % best define mappings before loading the file \startxmlsetups all:html \xmlsetsetup{main}{head|h1|h2}{*} \stopxmlsetups \xmlregistersetup{all:html} % register this so that it's done for each load \startxmlsetups h1 \subject{\xmlflush{#1}} \stopxmlsetups \startxmlsetups h2 \subsubject{\xmlflush{#1}} \stopxmlsetups \startxmlsetups head \startstandardmakeup THIS IS ABOUT: \xmlfilter{main}{/head/title/text()} \stopstandardmakeup \stopxmlsetups % that's it \setupcolors[state=start] \setuphead[subject][style=\bfd,color=blue] \setuphead[subsubject][style=\bfc,color=blue] \starttext \xmlprocess{main}{test.html}{} \stoptext ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
keep in mind that this is still somewhat experimental
Sure :) That's why I'm sending files for testing :) :) :)
% best define mappings before loading the file
\startxmlsetups all:html \xmlsetsetup{main}{head|h1|h2}{*} \stopxmlsetups
\xmlregistersetup{all:html}
% register this so that it's done for each load
\startxmlsetups h1 \subject{\xmlflush{#1}} \stopxmlsetups
\startxmlsetups h2 \subsubject{\xmlflush{#1}} \stopxmlsetups
\startxmlsetups head \startstandardmakeup THIS IS ABOUT: \xmlfilter{main}{/head/title/text()} \stopstandardmakeup \stopxmlsetups
% that's it
\setupcolors[state=start] \setuphead[subject][style=\bfd,color=blue] \setuphead[subsubject][style=\bfc,color=blue]
\starttext
\xmlprocess{main}{test.html}{}
\stoptext
Great! This works perfect and seems much easier to write than the old code, though I still have no idea how to implement some parts of it: - where to plug in the entities such as , ≤, ... - how to catch classes: how to differentiate between <h1>title</h1> and <h1 class="...">title</h1> - and some more - there are some simple examples in the attachment (too long to copy-paste) Thanks again, Mojca
Mojca Miklavec wrote:
Great! This works perfect and seems much easier to write than the old code, though I still have no idea how to implement some parts of it: - where to plug in the entities such as , ≤, ...
\xmlutfize{main} or just load the regular entity handlers (mkii still works and can be used mixed)
- how to catch classes: how to differentiate between <h1>title</h1> and <h1 class="...">title</h1> - and some more - there are some simple examples in the attachment (too long to copy-paste)
\doifelse {\xmlatt{#1}{class}} {whatever} { dothis } { dothat } Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Great! This works perfect and seems much easier to write than the old code, though I still have no idea how to implement some parts of it: - where to plug in the entities such as , ≤, ...
\xmlutfize{main}
Thanks. I saw it, but had no idea how to use it. I need to test more extensively ... :)
- how to catch classes: how to differentiate between <h1>title</h1> and <h1 class="...">title</h1> - and some more - there are some simple examples in the attachment (too long to copy-paste)
\doifelse {\xmlatt{#1}{class}} {whatever} { dothis } { dothat }
I have tried exactly that before, but this example fails to work for me, or I don't know how to apply it: % test.html <html> <body> <h1>Title 1</h1> <h1 class="different">Title 2</h1> </body> </html> % test.tex \startxmlsetups all:html \xmlsetsetup{main}{h1}{*} \stopxmlsetups \xmlregistersetup{all:html} \startxmlsetups h1 This title belongs to class (\xmlatt{#1}{class}): \xmlflush{#1}.\par \stopxmlsetups \starttext \xmlprocess{main}{test.html}{} \stoptext Class always comes out empty. Thanks a lot, Mojca
Mojca Miklavec wrote:
I have tried exactly that before, but this example fails to work for me, or I don't know how to apply it:
i rewrote the parser (both xml and semi-xpath) so it may have been broken, i'll upload a new beta tomorrow Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/16/07, Hans Hagen wrote:
i rewrote the parser (both xml and semi-xpath) so it may have been broken, i'll upload a new beta tomorrow
Hello Hans, Thanks a lot for fixing the issue with non-working \xmlatt. Now, I'm still slightly lost regarding two issues: - How to remove unneeded space? With \ignorespaces? - How to use the new verbatim code? I have tried to use \xmlsetfunction{main}{pre}{lxml.verbatim} but it didn't really work. % test.tex: \startxmlsetups all:html \xmlsetsetup{main}{h1|pre}{*} \stopxmlsetups \xmlregistersetup{all:html} % is this the proper way? \startxmlsetups h1 \subject{\ignorespaces\xmlflush{#1}} \stopxmlsetups \startxmlsetups pre {\bgroup\tt\obeylines\xmlflush{#1}\egroup} \stopxmlsetups \starttext \xmlprocess{main}{test.html}{} \stoptext % test.html <?xml version="1.0" encoding="utf-8"?> <html><body> <h1> How to get rid of this spacing in some elegant way? </h1> <p>Title followed by a paragraph ...</p> <pre> and some source c@de </pre> </body></html> Also, this fails because of the empty line: <h1> How to get rid of this spacing in some elegant way? </h1> Thanks a lot, Mojca
Mojca Miklavec wrote:
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
keep in mind that this is still somewhat experimental
Sure :) That's why I'm sending files for testing :) :) :)
- i'll make a table mapper (need it anyway), cals tables are already provided - idem for preformatted and verbatim - your code: d[k] = dk:gsub(" ",' ') dk = d[k] d[k] = dk:gsub("≤", '\\mathematics{\\le}') local dk = d[k] dk = dk:gsub(" ",' ') dk = dk:gsub("≤", '\\mathematics{\\le}') d[k] = dk or .... mojcasentities = { nbsp = " ", le = "'\\mathematics{\\le}' } local d[k]= d[k]:gsub("&(.-);",mojcasentities) (there probably already is code for that) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/14/07, Hans Hagen
Mojca Miklavec wrote:
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
keep in mind that this is still somewhat experimental
Sure :) That's why I'm sending files for testing :) :) :)
- i'll make a table mapper (need it anyway), cals tables are already provided
- idem for preformatted and verbatim
Thanks a lot. I'm waiting patiently :)
- your code:
d[k] = dk:gsub(" ",' ') dk = d[k] d[k] = dk:gsub("≤", '\\mathematics{\\le}')
local dk = d[k] dk = dk:gsub(" ",' ') dk = dk:gsub("≤", '\\mathematics{\\le}') d[k] = dk
or ....
mojcasentities = { nbsp = " ", le = "'\\mathematics{\\le}' }
local d[k]= d[k]:gsub("&(.-);",mojcasentities)
Thanks a lot!
(there probably already is code for that)
Yes, I saw it, but didn't try to understand what the &(.-) serves for. In any case, that was the wrong place to replace le with something. Thanks again, Mojca
On Sun, 16 Sep 2007, Mojca Miklavec wrote:
On 9/14/07, Hans Hagen
wrote:
mojcasentities = { nbsp = " ", le = "'\\mathematics{\\le}' }
local d[k]= d[k]:gsub("&(.-);",mojcasentities)
Yes, I saw it, but didn't try to understand what the &(.-) serves for.
(Caveat: I do not really know lua regex, and have not tried out the code) Assuming lua follows standard regex syntax, this means & # The letter & ( # start a group . # any character - # As few as needed ) # end group ; # the letter ; so this will match all entities. If it helps, the equivalent vim regex will be \&\(.\{-}\); I guess that $1 (the first group, that is everything that matches .-) will be compared with mojcaentities table and replaced accordingly. This looks like a really nice feature of lua. In Ruby and Vim, I often find myself writing a bunch of similar regex, and always wished there was something like what lua does. Aditya
Aditya Mahajan wrote:
(Caveat: I do not really know lua regex, and have not tried out the code)
they are not regexp but expressions -)
Assuming lua follows standard regex syntax, this means
& # The letter & ( # start a group .. # any character - # As few as needed ) # end group ; # the letter ;
so this will match all entities.
just &(.-); with () being the capture
If it helps, the equivalent vim regex will be \&\(.\{-}\);
I guess that $1 (the first group, that is everything that matches .-)
%1
will be compared with mojcaentities table and replaced accordingly.
indeed
This looks like a really nice feature of lua. In Ruby and Vim, I often find myself writing a bunch of similar regex, and always wished there was something like what lua does.
the nice thing about many lua feature is that less code (lua c code) behaves more powerful ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Aditya Mahajan
-
Hans Hagen
-
Mojca Miklavec