Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
<html> <head> <title>My first HTML2ConTeXt</title> </head> <body> <h1>Main Title</h1> <p>Some text ...</p> <h2>Subtitle</h2> <p>Some text again ...</p> <h1>Second title</h1> <p>... and not much more text here either ...</p> </body> </html>
And the failed tries here:
% engine=luatex \setupcolors[state=start] \setuphead[subject][style=bfa,color=blue] \setuphead[subsubject][style=tfa,color=blue]
\starttext \xmlload{main}{test.html}{} \xmlgrab{main}{h1}{h1} \xmlgrab{main}{h2}{h2}
\startxmlsetups h1 \subject{H1: #1} \stopxmlsetups
\startxmlsetups h2 \subsubject{H2: #1} \stopxmlsetups
How to grab only the title out of here?
\xmlfilter{main}{html/head/title}
\xmlflush{main} \stoptext
Any hints most wellcome.
Thank a lot, Mojca
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
<html> <head> <title>My first HTML2ConTeXt</title> </head> <body> <h1>Main Title</h1> <p>Some text ...</p> <h2>Subtitle</h2> <p>Some text again ...</p> <h1>Second title</h1> <p>... and not much more text here either ...</p> </body> </html>
And the failed tries here:
% engine=luatex \setupcolors[state=start] \setuphead[subject][style=bfa,color=blue] \setuphead[subsubject][style=tfa,color=blue]
\starttext \xmlload{main}{test.html}{} \xmlgrab{main}{h1}{h1} \xmlgrab{main}{h2}{h2}
\startxmlsetups h1 \subject{H1: #1} \stopxmlsetups
\startxmlsetups h2 \subsubject{H2: #1} \stopxmlsetups
How to grab only the title out of here?
\xmlfilter{main}{html/head/title}
\xmlflush{main} \stoptext
Any hints most wellcome.
keep in mind that this is still somewhat experimental
% best define mappings before loading the file
\startxmlsetups all:html \xmlsetsetup{main}{head|h1|h2}{*} \stopxmlsetups
\xmlregistersetup{all:html}
% register this so that it's done for each load
\startxmlsetups h1 \subject{\xmlflush{#1}} \stopxmlsetups
\startxmlsetups h2 \subsubject{\xmlflush{#1}} \stopxmlsetups
\startxmlsetups head \startstandardmakeup THIS IS ABOUT: \xmlfilter{main}{/head/title/text()} \stopstandardmakeup \stopxmlsetups
% that's it
\setupcolors[state=start] \setuphead[subject][style=\bfd,color=blue] \setuphead[subsubject][style=\bfc,color=blue]
\starttext
\xmlprocess{main}{test.html}{}
\stoptext
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
keep in mind that this is still somewhat experimental
Sure :) That's why I'm sending files for testing :) :) :)
% best define mappings before loading the file
\startxmlsetups all:html \xmlsetsetup{main}{head|h1|h2}{*} \stopxmlsetups
\xmlregistersetup{all:html}
% register this so that it's done for each load
\startxmlsetups h1 \subject{\xmlflush{#1}} \stopxmlsetups
\startxmlsetups h2 \subsubject{\xmlflush{#1}} \stopxmlsetups
\startxmlsetups head \startstandardmakeup THIS IS ABOUT: \xmlfilter{main}{/head/title/text()} \stopstandardmakeup \stopxmlsetups
% that's it
\setupcolors[state=start] \setuphead[subject][style=\bfd,color=blue] \setuphead[subsubject][style=\bfc,color=blue]
\starttext
\xmlprocess{main}{test.html}{}
\stoptext
Great! This works perfect and seems much easier to write than the old code, though I still have no idea how to implement some parts of it: - where to plug in the entities such as , ≤, ... - how to catch classes: how to differentiate between <h1>title</h1> and <h1 class="...">title</h1> - and some more - there are some simple examples in the attachment (too long to copy-paste)
Thanks again, Mojca
Mojca Miklavec wrote:
Great! This works perfect and seems much easier to write than the old code, though I still have no idea how to implement some parts of it:
- where to plug in the entities such as , ≤, ...
\xmlutfize{main}
or just load the regular entity handlers (mkii still works and can be used mixed)
- how to catch classes: how to differentiate between <h1>title</h1>
and <h1 class="...">title</h1>
- and some more - there are some simple examples in the attachment
(too long to copy-paste)
\doifelse {\xmlatt{#1}{class}} {whatever} { dothis } { dothat }
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Great! This works perfect and seems much easier to write than the old code, though I still have no idea how to implement some parts of it:
- where to plug in the entities such as , ≤, ...
\xmlutfize{main}
Thanks. I saw it, but had no idea how to use it. I need to test more extensively ... :)
- how to catch classes: how to differentiate between <h1>title</h1>
and <h1 class="...">title</h1>
- and some more - there are some simple examples in the attachment
(too long to copy-paste)
\doifelse {\xmlatt{#1}{class}} {whatever} { dothis } { dothat }
I have tried exactly that before, but this example fails to work for me, or I don't know how to apply it:
% test.html
<html> <body> <h1>Title 1</h1> <h1 class="different">Title 2</h1> </body> </html>
% test.tex
\startxmlsetups all:html \xmlsetsetup{main}{h1}{*} \stopxmlsetups \xmlregistersetup{all:html}
\startxmlsetups h1 This title belongs to class (\xmlatt{#1}{class}): \xmlflush{#1}.\par \stopxmlsetups
\starttext \xmlprocess{main}{test.html}{} \stoptext
Class always comes out empty.
Thanks a lot, Mojca
Mojca Miklavec wrote:
I have tried exactly that before, but this example fails to work for me, or I don't know how to apply it:
i rewrote the parser (both xml and semi-xpath) so it may have been broken, i'll upload a new beta tomorrow
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/16/07, Hans Hagen wrote:
i rewrote the parser (both xml and semi-xpath) so it may have been broken, i'll upload a new beta tomorrow
Hello Hans,
Thanks a lot for fixing the issue with non-working \xmlatt.
Now, I'm still slightly lost regarding two issues: - How to remove unneeded space? With \ignorespaces? - How to use the new verbatim code? I have tried to use \xmlsetfunction{main}{pre}{lxml.verbatim} but it didn't really work.
% test.tex: \startxmlsetups all:html \xmlsetsetup{main}{h1|pre}{*} \stopxmlsetups
\xmlregistersetup{all:html}
% is this the proper way? \startxmlsetups h1 \subject{\ignorespaces\xmlflush{#1}} \stopxmlsetups
\startxmlsetups pre {\bgroup\tt\obeylines\xmlflush{#1}\egroup} \stopxmlsetups
\starttext \xmlprocess{main}{test.html}{} \stoptext
% test.html <?xml version="1.0" encoding="utf-8"?> <html><body>
<h1> How to get rid of this spacing in some elegant way? </h1>
<p>Title followed by a paragraph ...</p>
<pre> and some source c@de </pre> </body></html>
Also, this fails because of the empty line:
<h1> How to get rid of this spacing in some
elegant way? </h1>
Thanks a lot, Mojca
Mojca Miklavec wrote:
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
keep in mind that this is still somewhat experimental
Sure :) That's why I'm sending files for testing :) :) :)
- i'll make a table mapper (need it anyway), cals tables are already provided
- idem for preformatted and verbatim
- your code:
d[k] = dk:gsub(" ",' ') dk = d[k] d[k] = dk:gsub("≤", '\mathematics{\le}')
local dk = d[k] dk = dk:gsub(" ",' ') dk = dk:gsub("≤", '\mathematics{\le}') d[k] = dk
or ....
mojcasentities = { nbsp = " ", le = "'\mathematics{\le}' }
local d[k]= d[k]:gsub("&(.-);",mojcasentities)
(there probably already is code for that)
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 9/14/07, Hans Hagen pragma@wxs.nl wrote:
Mojca Miklavec wrote:
On 9/14/07, Hans Hagen wrote:
Mojca Miklavec wrote:
Hello,
I was trying to figure out how to process simple HTML files with the new code, but I fail to understand the details. Here's a simple file I would like to process:
keep in mind that this is still somewhat experimental
Sure :) That's why I'm sending files for testing :) :) :)
- i'll make a table mapper (need it anyway), cals tables are already
provided
- idem for preformatted and verbatim
Thanks a lot. I'm waiting patiently :)
- your code:
d[k] = dk:gsub(" ",' ') dk = d[k] d[k] = dk:gsub("≤", '\mathematics{\le}')
local dk = d[k] dk = dk:gsub(" ",' ') dk = dk:gsub("≤", '\mathematics{\le}') d[k] = dk
or ....
mojcasentities = { nbsp = " ", le = "'\mathematics{\le}' }
local d[k]= d[k]:gsub("&(.-);",mojcasentities)
Thanks a lot!
(there probably already is code for that)
Yes, I saw it, but didn't try to understand what the &(.-) serves for. In any case, that was the wrong place to replace le with something.
Thanks again, Mojca
On Sun, 16 Sep 2007, Mojca Miklavec wrote:
On 9/14/07, Hans Hagen pragma@wxs.nl wrote:
mojcasentities = { nbsp = " ", le = "'\mathematics{\le}' }
local d[k]= d[k]:gsub("&(.-);",mojcasentities)
Yes, I saw it, but didn't try to understand what the &(.-) serves for.
(Caveat: I do not really know lua regex, and have not tried out the code)
Assuming lua follows standard regex syntax, this means
& # The letter & ( # start a group . # any character - # As few as needed ) # end group ; # the letter ;
so this will match all entities.
If it helps, the equivalent vim regex will be &(.{-});
I guess that $1 (the first group, that is everything that matches .-) will be compared with mojcaentities table and replaced accordingly. This looks like a really nice feature of lua. In Ruby and Vim, I often find myself writing a bunch of similar regex, and always wished there was something like what lua does.
Aditya
Aditya Mahajan wrote:
(Caveat: I do not really know lua regex, and have not tried out the code)
they are not regexp but expressions -)
Assuming lua follows standard regex syntax, this means
& # The letter & ( # start a group .. # any character
- # As few as needed
) # end group ; # the letter ;
so this will match all entities.
just &(.-); with () being the capture
If it helps, the equivalent vim regex will be &(.{-});
I guess that $1 (the first group, that is everything that matches .-)
%1
will be compared with mojcaentities table and replaced accordingly.
indeed
This looks like a really nice feature of lua. In Ruby and Vim, I often find myself writing a bunch of similar regex, and always wished there was something like what lua does.
the nice thing about many lua feature is that less code (lua c code) behaves more powerful
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------