xml content and tweaked pdf output
Hi, very carefully I am trying to make first steps towards XML and ConTeXt (with MkIV). Thus, I have enjoyed reading Thomas' MyWay "Getting Web Content and pdf-Output from One Source": I only kept wondering, how to keep control over the pdf-Output in terms of fine-tuning the actual typesetting? A quick search in the archive gave me the answer that is attached below: by using XMLentities. But coming back to Thomas' issue "Getting Web Content and pdf-Output from One Source": What about the other branch, getting web content? Doesn't the XML source gets "spoiled" by these inserted XMLentities that only make sense when following the pdf-Output branch? Or will these XMLentities be silently ignored when feeding the XML source in a CMS system or processing further to web content? Apologies for asking such basic questions... Any help or tips to deal with this hybrid will be greatly appreciated Steffen Am 13.04.2008 um 11:54 schrieb Taco Hoekwater:
Thomas A. Schmitz wrote:
Hi gang,
speaking of xml... I have two easy questions, but can't find an answer. It's about tweaking the pdf-output I get:
1. How do you add an additional hyphenation to a word? How would I enter the equivalent of super\-duper in an xml-file? I tried super&addhyphen;duper with this definition: \defineXMLentity[addhyphen]{\-}
in my environment, but this doesn't seem to work.
Needs an example file, because
\defineXMLentity[addhyphen]{\-} \starttext \hsize 1in \startXMLdata I tried super&addhyphen;duper \stopXMLdata \stoptext
works in both mkii and mkiv.
2. Similar question: how to prevent an unwanted ligature, esp. in German? In TeX, I write Kauf{}laden. What would be a good way to do this in xml? I was thinking of Kauf&nolig;laden and \defineXMLentity[nolig]{\kern0pt}
That should work, but probably better is: \defineXMLentity[nolig]{\prewordbreak\kern0pt\postwordbreak} because the \kern will disable hyphenation, otherwise.
In mkii, just \defineXMLentity[nolig]{{}} also works (but not in mkiv, and that is a feature).
Best wishes, Taco ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
Hi If you run \starttext \startformula V) V\exists F\exists \stopformula \stoptext the space between V and ) for example is not correct. (luatex .50) MO
Mehdi Omidali wrote:
Hi If you run \starttext \startformula V) V\exists F\exists \stopformula \stoptext the space between V and ) for example is not correct. (luatex .50)
The reason why there is no italic correction is because luatex sees a simple list of mathords, and it does not apply italic corrections between those when a 'new math font' is being used. (if it did, operators like "sin" would come out badly). Instead, it relies on ordinary inter-glyph kerning. I am not at all sure what the best solution for this is, as I have no idea how MS Word differentiates between the formula above and 'multi-letter identifiers'. Perhaps luatex should look at the \catcode of the characters in question? Or perhaps these inter-char kerns actually exist in Cambria? Best wishes, Taco
On Mar 10, 2010, at 10:58 AM, Steffen Wolfrum wrote:
Hi,
very carefully I am trying to make first steps towards XML and ConTeXt (with MkIV).
Thus, I have enjoyed reading Thomas' MyWay "Getting Web Content and pdf-Output from One Source":
I only kept wondering, how to keep control over the pdf-Output in terms of fine-tuning the actual typesetting? A quick search in the archive gave me the answer that is attached below: by using XMLentities.
But coming back to Thomas' issue "Getting Web Content and pdf-Output from One Source": What about the other branch, getting web content? Doesn't the XML source gets "spoiled" by these inserted XMLentities that only make sense when following the pdf-Output branch? Or will these XMLentities be silently ignored when feeding the XML source in a CMS system or processing further to web content?
Apologies for asking such basic questions...
I'm not really that advanced in this area myself, but from what I think I understood, you have to distinguish several aspects: 1. The MyWay addressed xhtml and mapping that to ConTeXt output. In html, you have a list of predefined entities (http://www.w3schools.com/tags/ref_entities.asp) and I don't think that you can simply define your own entities in html - this simply is not the way this is meant to work. So in this case, the answer to your question would be: you're using the wrong tool. 2. In xml, on the other hand, there are almost no predefined entities, you can and must define entities yourself. But xml in itself cannot be shown as web content; you will need a xsl file which translates your xml to some sort of html. This will allow you to define most anything you want, and you can indeed add all these typographical niceties. You can then either use a tool such as xsltproc or saxon to produce a "clean" html version yourself or you can leave it to the browser. So: if you're primarily thinking of web content that should also be typeset, use html and be aware that you probably won't be able to use all the power of ConTeXt. If you're thinking of content that will be typeset but which you also want to use in other forms (web content being just one of them), use xml. In that case, you will have to learn at least some xslt as well... Btw, the thread you quoted refers to mkii entities, you know that the deinitions in mkiv are somewhat different, right? Thomas
On Wed, Mar 10, 2010 at 11:38 AM, Thomas A. Schmitz
2. In xml, on the other hand, there are almost no predefined entities, you can and must define entities yourself. But xml in itself cannot be shown as web content; you will need a xsl file which translates your xml to some sort of html. This will allow you to define most anything you want, and you can indeed add all these typographical niceties. You can then either use a tool such as xsltproc or saxon to produce a "clean" html version yourself or you can leave it to the browser.
You can also use css to show a xml http://www.w3schools.com/Xml/xml_display.asp but xslt is the main way. -- luigi
Am 10.03.2010 um 11:38 schrieb Thomas A. Schmitz:
On Mar 10, 2010, at 10:58 AM, Steffen Wolfrum wrote:
Hi,
very carefully I am trying to make first steps towards XML and ConTeXt (with MkIV).
Thus, I have enjoyed reading Thomas' MyWay "Getting Web Content and pdf-Output from One Source":
I only kept wondering, how to keep control over the pdf-Output in terms of fine-tuning the actual typesetting? A quick search in the archive gave me the answer that is attached below: by using XMLentities.
But coming back to Thomas' issue "Getting Web Content and pdf-Output from One Source": What about the other branch, getting web content? Doesn't the XML source gets "spoiled" by these inserted XMLentities that only make sense when following the pdf-Output branch? Or will these XMLentities be silently ignored when feeding the XML source in a CMS system or processing further to web content?
Apologies for asking such basic questions...
I'm not really that advanced in this area myself, but from what I think I understood, you have to distinguish several aspects:
1. The MyWay addressed xhtml and mapping that to ConTeXt output. In html, you have a list of predefined entities (http://www.w3schools.com/tags/ref_entities.asp) and I don't think that you can simply define your own entities in html - this simply is not the way this is meant to work. So in this case, the answer to your question would be: you're using the wrong tool.
Sorry for being confused: In your MyWay you talk about xml and show an xhtml example. It seems I mixed this.
2. In xml, on the other hand, there are almost no predefined entities, you can and must define entities yourself. But xml in itself cannot be shown as web content; you will need a xsl file which translates your xml to some sort of html. This will allow you to define most anything you want, and you can indeed add all these typographical niceties. You can then either use a tool such as xsltproc or saxon to produce a "clean" html version yourself or you can leave it to the browser.
Exactly, this is what I meant: Wouldn't those typesetting orientated entities cause problems here? If I follow Luigis link to ... http://www.w3schools.com/Xml/tryxslt.asp?xmlfile=simple&xsltfile=simple ... and naively insert the mentioned below entity "addhyphen" ... "two of our famous Belgian&addhyphen;Waffles with plenty of real maple syrup" ... the xslt process get's disturbed: "XML Parsing Error: undefined entity Location: http://www.w3schools.com/xsl/tryxslt_result.asp Line Number 7, Column 41:"
So: if you're primarily thinking of web content that should also be typeset, use html and be aware that you probably won't be able to use all the power of ConTeXt. If you're thinking of content that will be typeset but which you also want to use in other forms (web content being just one of them), use xml. In that case, you will have to learn at least some xslt as well...
Btw, the thread you quoted refers to mkii entities, you know that the deinitions in mkiv are somewhat different, right?
When reading Taco's reply to that thread ...
Needs an example file, because
\defineXMLentity[addhyphen]{\-} \starttext \hsize 1in \startXMLdata I tried super&addhyphen;duper \stopXMLdata \stoptext
works in both mkii and mkiv.
... I assumed it's the same in mkii and mkiv? Steffen
On Mar 10, 2010, at 12:49 PM, Steffen Wolfrum wrote:
Sorry for being confused: In your MyWay you talk about xml and show an xhtml example. It seems I mixed this.
xhtml is a subset of xml, AFAIK. But maybe I should add a paragraph explaining this.
Exactly, this is what I meant: Wouldn't those typesetting orientated entities cause problems here?
If I follow Luigis link to ... http://www.w3schools.com/Xml/tryxslt.asp?xmlfile=simple&xsltfile=simple
... and naively insert the mentioned below entity "addhyphen" ... "two of our famous Belgian&addhyphen;Waffles with plenty of real maple syrup"
... the xslt process get's disturbed: "XML Parsing Error: undefined entity Location: http://www.w3schools.com/xsl/tryxslt_result.asp Line Number 7, Column 41:"
Yes, as I said: you have to define your entities, e.g. in the DOCTYPE declaration. That's something I discussed with Hans a few weeks ago: in the case you mention, you would have two different definitions of the entity &addhyphen; One in the DOCTYPE, which will be followed by the xslt processor: <!ENTITY addhyphen ""> (i.e. do nothing about it) and one in the ConTeXt environment file: \xmlsetentity{addhyphen}{\-} which will add the discretionary hyphen. And that's exactly what you wanted: typographical niceties for pdf output which will not disturb viewing the file on the web.
When reading Taco's reply to that thread ...
..........
... I assumed it's the same in mkii and mkiv?
Rule of thumb: mkii setups use uppercase XML, mkiv uses lowercase xml ("Introduction" of xml-mkiv.pdf). The main difference between the two is [Hans, is this right? correct me if I'm wrong]: mkii basically uses a streaming model, i.e., it translates one part of the xml file after the other. Reusing nodes and elements that have already been processed is possible, but difficult. mkiv loads the entire xml tree into memory; you can access any element at any time. Thomas
Am 10.03.2010 um 13:35 schrieb Thomas A. Schmitz:
On Mar 10, 2010, at 12:49 PM, Steffen Wolfrum wrote:
Sorry for being confused: In your MyWay you talk about xml and show an xhtml example. It seems I mixed this.
xhtml is a subset of xml, AFAIK. But maybe I should add a paragraph explaining this.
Exactly, this is what I meant: Wouldn't those typesetting orientated entities cause problems here?
If I follow Luigis link to ... http://www.w3schools.com/Xml/tryxslt.asp?xmlfile=simple&xsltfile=simple
... and naively insert the mentioned below entity "addhyphen" ... "two of our famous Belgian&addhyphen;Waffles with plenty of real maple syrup"
... the xslt process get's disturbed: "XML Parsing Error: undefined entity Location: http://www.w3schools.com/xsl/tryxslt_result.asp Line Number 7, Column 41:"
Yes, as I said: you have to define your entities, e.g. in the DOCTYPE declaration. That's something I discussed with Hans a few weeks ago: in the case you mention, you would have two different definitions of the entity &addhyphen; One in the DOCTYPE, which will be followed by the xslt processor:
<!ENTITY addhyphen "">
(i.e. do nothing about it)
...
So again following Luigi's link ...
http://www.w3schools.com/Xml/tryxslt.asp?xmlfile=simple&xsltfile=simple
... I add a "&addhyphen;" down in the text and add the corresponding definition up in the DOCTYPE line:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE MyWay [<!ENTITY addhyphen " ">]>
<!-- Edited by XMLSpy® -->
On 10-3-2010 13:35, Thomas A. Schmitz wrote:
The main difference between the two is [Hans, is this right? correct me if I'm wrong]: mkii basically uses a streaming model, i.e., it translates one part of the xml file after the other. Reusing nodes and elements that have already been processed is possible, but difficult. mkiv loads the entire xml tree into memory; you can access any element at any time.
indeed ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (6)
-
Hans Hagen
-
luigi scarso
-
Mehdi Omidali
-
Steffen Wolfrum
-
Taco Hoekwater
-
Thomas A. Schmitz