Re: [NTG-context] EPUB XHTML Format
On Thu, 5 Sep 2013, honyk wrote:
On 2013-09-04 Thangalin wrote:
What needs to happen to take a minimal ConTeXt file (such as the attached) to produce a minimum viable EPUB that:
It is always difficult to parse and further process not well structured plain text without advanced semantics. Garbage in, garbage out.
The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML
If you need both EPUB and PDF, start with a semantically rich XML vocabulary, e.g. DocBook. In this case you can relatively easy transfrom (XSLT) input data into almost any format. These basic outputs like EPUB or PDF (via XSL-FO) you can get out-of-the-box. The Context output can be generated using dbcontext: http://dblatex.sourceforge.net/
In sum, use XML as your primary source and from it derive everything else.
I haven't used XML-only toolchains. Is it possible to handle: - Automatic section numbering taking care of different conversions. - Automatic index generation and sorting - Inserting hyphenation points at the approriate place in the generated ouput (so that the browser can effectively rely on TeX's hyphenation algorithm to do linebreaking). - Convert TeX math to MathML. The current ConTeXT XML source can translate a well formed ConTeXt document into a XML document with the above features. Aditya
On 9/5/2013 8:20 PM, Aditya Mahajan wrote:
The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML
but how hard would it be to make an xslt tranformation from context.export to epub variants (ok, at some point i can look into it but only if there is a robust standard and i have devices to test it on) and indeed the quality of the source is important Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
I'd say use an xml source (docbook, TEI, or DITA) and then write a ConTeXt
stylesheet to typeset your XML. See http://wiki.contextgarden.net/TEI_xml
I think that TEI-lite is a nice, very general XML vocabulary...
Best,
Mica
On Thu, Sep 5, 2013 at 11:24 AM, Hans Hagen
On 9/5/2013 8:20 PM, Aditya Mahajan wrote:
The typical ConTeXt document has a lot of structure, and the XML export
generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML
but how hard would it be to make an xslt tranformation from context.export to epub variants (ok, at some point i can look into it but only if there is a robust standard and i have devices to test it on)
and indeed the quality of the source is important
Hans
------------------------------**------------------------------**----- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl ------------------------------**------------------------------**----- ______________________________**______________________________** _______________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/** listinfo/ntg-context http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/**projects/contextrev/http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ______________________________**______________________________** _______________________
Le 05/09/2013 20:24, Hans Hagen a écrit :
On 9/5/2013 8:20 PM, Aditya Mahajan wrote:
The typical ConTeXt document has a lot of structure, and the XML export generates a well structured XML output. That can be directly used in most modern browsers that handle XML+CSS well. However, most (all?) EPUB readers don't. So, the question is asking if instead ConTeXt could generate a XHTML
but how hard would it be to make an xslt tranformation from context.export to epub variants (ok, at some point i can look into it but only if there is a robust standard and i have devices to test it on)
and indeed the quality of the source is important
Sounds by far to be the cleanest approach. Cheers, mh
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
Hi, handle XML+CSS well. However, most (all?) EPUB readers don't. So, the
question is asking if instead ConTeXt could generate a XHTML
Precisely.
If you need both EPUB and PDF, start with a semantically rich XML
vocabulary, e.g. DocBook. In this case you can relatively easy transfrom
My database doesn't generate DocBook. It generates a custom XML document from which I generate a web page, and a LaTeX document (though soon to be ConTeXt!). There is no reason, technically, why I cannot convert the source XML to either DocBook or directly to EPUB. There are, however, problems doing that, which Aditya correctly surmises:
- Automatic section numbering taking care of different conversions. - Automatic index generation and sorting - Inserting hyphenation points at the appropriate place in the generated output (so that the browser can effectively rely on TeX's hyphenation algorithm to do line-breaking). - Convert TeX math to MathML.
The current ConTeXT XML source can translate a well formed ConTeXt document into a XML document with the above features.
Those are exactly the issues that I would love to resolve using ConTeXt for generating an EPUB. (The MathML isn't as important to me, but I can see other people wanting such a feature.) What about accessibility? I expect that visually impaired people would
depend on document structure rather than its visualisation.
That is a good point. The current XML structure produced by ConTeXt (Hans correct me here if I'm mistaken) is not accessible, as it doesn't adhere to strict XHTML. I suspect that <div> tags would not be accessible -- the only way to provide true accessibility in EPUB format would be by using the strict XHTML tags. for instance, we have more levels than H1..H6, so how to do H7? if someone
has to deal with that, he/she can as well transform all into H1 with some class which is a local solution then
I realize there is not going to be a one-to-one map of all possible ConTeXt macros to XHTML. For someone who has 7 levels of nested sections they would either have to rewrite some Lua or perform some post-processing (e.g., with XSLT). I would posit that a document with 7 levels of nested sections is not going to be a common occurrence. When I talk about strict XHTML, I'm proposing that a _simple_ ConTeXt document (up to 6 header levels, numbered and unnumbered lists, images, text emphasis, etc.) should generate a simple, validating XHTML document. Trying to attain 100% coverage of ConTeXt transmogrification to XHTML is ridiculous when, I suspect, 80% coverage would meet most needs. :-) It is definitely possible to translate the ConTeXt EPUB output to XHTML. However, there are practical realities that hinder such an approach. Architecturally, if anyone is going to translate an XML document to EPUB format, it certainly won't be this way: *XML + XSLT -> ConTeXT File -> ConTeXt EPUB XML + XSLT -> EPUB + CSS* It'll be this way, which is less time-consuming, less complex, and less susceptible to err: *XML + XSLT (or API) -> EPUB + CSS* However, it does not, as we all know, produce as feature rich output as leveraging the ConTeXt abilities that Aditya mentioned, which was the point: *XML + XSLT -> ConTeXT TeX -> EPUB + CSS* Kindest regards.
On 9/6/2013 12:00 AM, Thangalin wrote:
That is a good point. The current XML structure produced by ConTeXt (Hans correct me here if I'm mistaken) is not accessible, as it doesn't adhere to strict XHTML. I suspect that <div> tags would not be accessible -- the only way to provide true accessibility in EPUB format would be by using the strict XHTML tags.
html is not rich enough .. one ends up with abusing tags which in turn is confusing for accesibility ... i once saw an epub where h1 was used for the chapter number and h2 for the chapter title
When I talk about strict XHTML, I'm proposing that a _simple_ ConTeXt document (up to 6 header levels, numbered and unnumbered lists, images, text emphasis, etc.) should generate a simple, validating XHTML document. Trying to attain 100% coverage of ConTeXt transmogrification to XHTML is ridiculous when, I suspect, 80% coverage would meet most needs.. :-)
in that case a few page transformation could do, isn't it?
*XML + XSLT -> ConTeXT TeX -> EPUB + CSS*
probably ok for novels but who there is no way to limit the user ... so in the end we still have a complex mix to deal with ... i'd rather have ConTeXT TeX reading xml -> export -> optional transform -> EPUB + CSS* you want 'direct epub html from context' (no xslt) but on the other hand use xslt to map onto context while context can do xml directly ... chicken egg Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Another small note, since I just walked down the ePUB path: you'll be very
sad to find out that a lot of rendering engines for popular readers are not
consistent, won't render standard XHTML markup correctly (nest an ordered
list within an unordered list and then look at it in adobe digital editions
and several other readers). "But it is just XHML + CSS!" you'll cry, "How
can they not render it correctly?" I don't know, but it was an extremely
frustrating process. I even contacted adobe to try and report this nested
list bug to them... their suggestion was that I could *pay* them to work
with "content experts" who would help me "correct" my source so that it
would render "correctly."
The best reader imho is iBooks on the iPad, nothing else, from what I've
seen, comes close. But that is one expensive eReader. :(
On Thu, Sep 5, 2013 at 3:00 PM, Thangalin
Hi,
handle XML+CSS well. However, most (all?) EPUB readers don't. So, the
question is asking if instead ConTeXt could generate a XHTML
Precisely.
If you need both EPUB and PDF, start with a semantically rich XML
vocabulary, e.g. DocBook. In this case you can relatively easy transfrom
My database doesn't generate DocBook. It generates a custom XML document from which I generate a web page, and a LaTeX document (though soon to be ConTeXt!). There is no reason, technically, why I cannot convert the source XML to either DocBook or directly to EPUB. There are, however, problems doing that, which Aditya correctly surmises:
- Automatic section numbering taking care of different conversions. - Automatic index generation and sorting - Inserting hyphenation points at the appropriate place in the generated output (so that the browser can effectively rely on TeX's hyphenation algorithm to do line-breaking).
- Convert TeX math to MathML.
The current ConTeXT XML source can translate a well formed ConTeXt document into a XML document with the above features.
Those are exactly the issues that I would love to resolve using ConTeXt for generating an EPUB. (The MathML isn't as important to me, but I can see other people wanting such a feature.)
What about accessibility? I expect that visually impaired people would
depend on document structure rather than its visualisation.
That is a good point. The current XML structure produced by ConTeXt (Hans correct me here if I'm mistaken) is not accessible, as it doesn't adhere to strict XHTML. I suspect that <div> tags would not be accessible -- the only way to provide true accessibility in EPUB format would be by using the strict XHTML tags.
for instance, we have more levels than H1..H6, so how to do H7? if someone
has to deal with that, he/she can as well transform all into H1 with some class which is a local solution then
I realize there is not going to be a one-to-one map of all possible ConTeXt macros to XHTML. For someone who has 7 levels of nested sections they would either have to rewrite some Lua or perform some post-processing (e.g., with XSLT). I would posit that a document with 7 levels of nested sections is not going to be a common occurrence.
When I talk about strict XHTML, I'm proposing that a _simple_ ConTeXt document (up to 6 header levels, numbered and unnumbered lists, images, text emphasis, etc.) should generate a simple, validating XHTML document. Trying to attain 100% coverage of ConTeXt transmogrification to XHTML is ridiculous when, I suspect, 80% coverage would meet most needs. :-)
It is definitely possible to translate the ConTeXt EPUB output to XHTML. However, there are practical realities that hinder such an approach. Architecturally, if anyone is going to translate an XML document to EPUB format, it certainly won't be this way:
*XML + XSLT -> ConTeXT File -> ConTeXt EPUB XML + XSLT -> EPUB + CSS*
It'll be this way, which is less time-consuming, less complex, and less susceptible to err:
*XML + XSLT (or API) -> EPUB + CSS*
However, it does not, as we all know, produce as feature rich output as leveraging the ConTeXt abilities that Aditya mentioned, which was the point:
*XML + XSLT -> ConTeXT TeX -> EPUB + CSS*
Kindest regards.
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net
___________________________________________________________________________________
Hi, The best reader imho is iBooks on the iPad, nothing else, from what I've
seen, comes close. But that is one expensive eReader. :(
We'll just have everybody in the world who has a Kindle, Kobo, or other reader exchange their existing hardware, and then purchase an iPad plus iBook. Problem solved? ;-) ConTeXT TeX reading xml -> export -> optional transform -> EPUB + CSS*
you want 'direct epub html from context' (no xslt) but on the other hand use xslt to map onto context while context can do xml directly ... chicken egg
Well, given that ConTeXt doesn't actually produce validating EPUB documents, I suspect not many people will actually use that feature. It's great in theory, but if it produces books that don't actually work on the Kindle or Kobo, then it's unusable in practice -- never mind not being able to add the books to online marketplaces (such as Amazon) because, again, the output does not validate. Kind regards.
Hi, never mind not being able to add the books to online marketplaces (such as
Amazon) because, again, the output does not validate.
I think the simplest thing to do would be to update the wiki and have a note that informs readers that while ConTeXt can be used to generate an EPUB, it is likely that that EPUB will be unusable for devices without further transformation of the XML content. At least that way the knowledge is out there and people are forewarned that not all EPUB documents are equivalent. Kindest regards.
On Fri, 6 Sep 2013, Thangalin wrote:
Hi,
never mind not being able to add the books to online marketplaces (such as
Amazon) because, again, the output does not validate.
I think the simplest thing to do would be to update the wiki and have a note that informs readers that while ConTeXt can be used to generate an EPUB, it is likely that that EPUB will be unusable for devices without further transformation of the XML content. At least that way the knowledge is out there and people are forewarned that not all EPUB documents are equivalent.
It will also be nice to add a table that lists the EPUB readers (hardware and software) and tells whether ConTeXt produced EPUB documents work on them. Aditya
On 9/6/2013 10:20 PM, Thangalin wrote:
Hi,
The best reader imho is iBooks on the iPad, nothing else, from what I've seen, comes close. But that is one expensive eReader. :(
We'll just have everybody in the world who has a Kindle, Kobo, or other reader exchange their existing hardware, and then purchase an iPad plus iBook. Problem solved? ;-)
ConTeXT TeX reading xml -> export -> optional transform -> EPUB + CSS* you want 'direct epub html from context' (no xslt) but on the other hand use xslt to map onto context while context can do xml directly ... chicken egg
Well, given that ConTeXt doesn't actually produce validating EPUB documents, I suspect not many people will actually use that feature. It's great in theory, but if it produces books that don't actually work on the Kindle or Kobo, then it's unusable in practice -- never mind not being able to add the books to online marketplaces (such as Amazon) because, again, the output does not validate.
context doesn't produce epub (which at this moment is so floating that i would keep updating, which is fine if i'd use it myself or in projects at pragma, but not for the sake of keeping up) but does an export to xml (*.export) as a bonus it can output some extra stuff so that in a browser that can deal with xml+css (and a few xhtml tags for hyperlinks) we can preview then there is mtx-epub that can make an epub but that is a moving target (at some point we stopped extending waiting for a decent standard) so, i'd never claim that context produces epub but it can be used in a workflow that involves epub as it outputs xml which can be transformed supporting all variants of epub in the backend would be the same as hardcoding all kind of xml dts in the frontend (docbook, tei, whatever); instead we provide a general xml handler and a general xml export Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi, so, i'd never claim that context produces epub but it can be used in a
workflow that involves epub as it outputs xml which can be transformed
That's a distinction that either might not matter or sometimes is lost: http://tex.stackexchange.com/a/17642/2148 http://wiki.contextgarden.net/epub "ConTeXt has preliminary epub http://en.wikipedia.org/wiki/EPUBsupport..." Does ConTeXt refer to a suite of tools, or only the "context" command? Either way, it appears that the line between the command and the tool set is blurred a bit. This is completely understandable, too, as you wouldn't want to write, "the ConTeXt suite of tools includes a command, mtxrun, that can produce EPUB files" all the time when talking about EPUBs.
supporting all variants of epub in the backend would be the same as hardcoding all kind of xml dts in the frontend (docbook, tei, whatever); instead we provide a general xml handler and a general xml export
That paragraph would be an excellent addition to the wiki; not sure where though. Kind regards.
participants (5)
-
Aditya Mahajan
-
Hans Hagen
-
Mica Semrick
-
Michael Hallgren
-
Thangalin