xpdf/poppler usage: Movements on the poppler side, and a request for help
Dear all, recently more projects that do not need any of the fancy graphical interfaces are considering using poppler, e.g. CUPS. However, the poppler developers plan to make unavailable the old, undocumented, never planned and unmaintainable API of "plain libpoppler". These two wishes together give us a fair chance that someone will actually do the work and develop a clean API for a plain C-only poppler, without any graphics payload. Therefore it would be nice if someone among the pdfTeX developers would take part in the discussion, in particular help answer the question in http://lists.freedesktop.org/archives/poppler/2006-October/002260.html ,---- | What functionality would the tetex people need exported from such an | API? `---- At a short glance over pdftoepdf.cc and pdftosrc.cc, I only found these symbols: Ref GfxFont GBool PDFDoc GString LinkDest Stream But in fact there may be more - and you may also have some wishes about PDF parsing and manipulation that need to be added yet. Is there anyone who would take up this task? TIA, Frank P.S. For those who don't remember what this is about: The frequent security bugs in xpdf code, sometimes exploitable with hand-crafted pdf files, are considered to be also a security problem in pdfTeX and other software that embeds such code, by the security teams of different Linux distributions. And it causes major headaches, since each embedded xpdf version needs a slightly changed patch. The goal is to replace the xpdf code with a poppler copy, so that static linking would still be possible, but distributions could also link dynamically and fix security problems with just one new package for libpoppler. -- Dr. Frank Küster Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich Debian Developer (teTeX/TeXLive)
it makes more sense then to look into http://ccxvii.net/apparition/ because "However, the poppler developers plan to make unavailable the old, undocumented, never planned and unmaintainable API of "plain libpoppler" is not something you want to repeat again or alternatively, try to maintain a healthy relationship with Derek and stick to xpdf Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen
it makes more sense then to look into
because "However, the poppler developers plan to make unavailable the old, undocumented, never planned and unmaintainable API of "plain libpoppler" is not something you want to repeat again
I am not the one to judge whether MuPDF or xpdf/poppler is the better choice. But I don't understand your argument about "is not something you want to repeat again". There never existed a clean API for xpdf, people just used things they needed from different parts of the code. poppler has forked xpdf to create a shared library, which is a move I fully support. poppler has built libraries with a well-defined, reliable API on top of that, but has also provided the undefined "API" that xpdf code-users are used to, and therefore poppler can currently be used as a (nearly, one data type is renamed) drop-in replacement for xpdf. If the poppler developers now consider to no longer provide this unreliable "API", but offer to create one more library with a well-defined, reliable API for non-display uses - then I would say this is something that speaks *for* poppler. It seems to indicate that they care for their users (and don't try to hide the problems with using an undocumented API). If it turns out that the poppler people are not willing to listen to pdfTeX developers when it is about creating a non-display library version: Then we can still decide that we should not switch to poppler (or rather, for sure we will, since we don't want pdfTeX to be linked to qt or gtk or such). But I think we should not miss the opportunity to tell them our wishes.
or alternatively, try to maintain a healthy relationship with Derek and stick to xpdf
I don't see why a healthy relationship with Derek is needed just to use the code, but the point is that a) there are valid reasons not to use the code as long as it can only be linked statically (note that at least Debian and Ubuntu Linux already link pdfTeX against libpoppler for those reasons) b) people who have tried to maintain or build up a relationship to Derek have not been able to convince him to provide a shared library. So while I have nothing against MuPDF, I still think some pdfTeX developer should get in touch with the poppler peopler and communicate with them. They have asked, and after all I expect that it will be much easier to switch to a poppler library that has been taylored along our wishes, than to MuPDF which exists independently and doesn't seem to have any relationship to the xpdf code we currently use. Regards, Frank -- Dr. Frank Küster Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich Debian Developer (teTeX/TeXLive)
Frank � wrote:
If the poppler developers now consider to no longer provide this unreliable "API", but offer to create one more library with a well-defined, reliable API for non-display uses - then I would say this is something that speaks *for* poppler. It seems to indicate that they care for their users (and don't try to hide the problems with using an undocumented API).
ok, i misunderstood that part; i was under the impression that it were another group who had to take up that api
If it turns out that the poppler people are not willing to listen to pdfTeX developers when it is about creating a non-display library version: Then we can still decide that we should not switch to poppler (or rather, for sure we will, since we don't want pdfTeX to be linked to qt or gtk or such).
it all depends on how portable to other platforms the library is; i assume that it is the intention to have it running on each platform then; concerning an api ... i think that the main question is to what extend we want to be able to manipulate the content of to-be-embedded objects (for instance in the perspective of merging annotations, changing colors, etc); i think that currently only font related objects are somehow manipulated; maybe martin/thanh can tell how much additional code is written for that on top of xpdf; maybe they have a kind of api spec already (i never looked into the source in detail)
b) people who have tried to maintain or build up a relationship to Derek have not been able to convince him to provide a shared library.
hm, i like static binaries (having only bad experiences with libs when updating but that's another story)
So while I have nothing against MuPDF, I still think some pdfTeX developer should get in touch with the poppler peopler and communicate with them. They have asked, and after all I expect that it will be much easier to switch to a poppler library that has been taylored along our wishes, than to MuPDF which exists independently and doesn't seem to have any relationship to the xpdf code we currently use.
it depends if mupdf will provide some manipulation features (hard to maintain independently) i happily leave cooking up a spec to the pdftex code wizzards, as long as they can guarantee that pdftex produces the same output on each platform; that's the part that worries me most because in general tex development has a rather bad history of cross platform code development (those os-wars). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
[...] that's the part that worries me most because in general tex development has a rather bad history of cross platform code development
W H A A A T ? ? ! ! ! ! ! TeX is surely /the/ program, /par excellence/, that is guaranteed to produce identical results on every platform. I genuinely don't understand to what "rather bad history of cross platform code development" you refer, Hans. BNB cc'd, because I suspect she may want to comment on this. ** Phil.
Disgusted of Tunbridge Wells wrote:
Hans Hagen wrote:
[...] that's the part that worries me most because in general tex
development has a rather bad history of cross platform code development
W H A A A T ? ? ! ! ! ! !
If I was a windows or traditional macintosh user, I would have said precisely the same thing. Taco
Disgusted of Tunbridge Wells wrote:
Hans Hagen wrote:
[...] that's the part that worries me most because in general tex
development has a rather bad history of cross platform code development
W H A A A T ? ? ! ! ! ! !
TeX is surely /the/ program, /par excellence/, that is guaranteed to produce identical results on every platform. I genuinely don't understand to what "rather bad history of cross platform code development" you refer, Hans. BNB cc'd, because I suspect she may want to comment on this.
that's another matter; the fact that it does produce identical results is related to the tex kernel, not to the actual binary that goed beyond tex by being integrated in a cross platform tex environment (say web2c); https://xemtex.groups.foundry.supelec.fr/xemtex-web-gb-2-5.html summarizes this nicely; anyhow, it has been discussed frequently at recent user group meetings Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
that's another matter; the fact that it does produce identical results is related to the tex kernel, not to the actual binary that goed beyond tex by being integrated in a cross platform tex environment (say web2c);
https://xemtex.groups.foundry.supelec.fr/xemtex-web-gb-2-5.html
summarizes this nicely.
Yes, I read Fabrice's remarks some time ago, and they saddened me enormously (just as I am saddened every time licensing issues, or open-source considerations, or whatever, get in the way of making useful software generally available). But the fact remains that TeX itself is probably the most reliable large piece of software in the world when it comes to reproducibility between platforms : the can of worms that was introduced when Pascal ceased to be regarded as a viable source language for direct code generation is surely another issue completely. I confess to remaining baffled as to why no-one has ever attempted a Delphi port for Windows, given that Delphi is actually a rather radically re-engineered Pascal derivative ... ** Phil.
2006/10/27, Frank Küster
recently more projects that do not need any of the fancy graphical interfaces are considering using poppler, e.g. CUPS. However, the poppler developers plan to make unavailable the old, undocumented, never planned and unmaintainable API of "plain libpoppler".
Whatever that is.
These two wishes together give us a fair chance that someone will actually do the work and develop a clean API for a plain C-only poppler, without any graphics payload. Therefore it would be nice if someone among the pdfTeX developers would take part in the discussion, in particular help answer the question in
http://lists.freedesktop.org/archives/poppler/2006-October/002260.html
,---- | What functionality would the tetex people need exported from such an | API? `----
I just did.
At a short glance over pdftoepdf.cc and pdftosrc.cc, I only found these symbols:
Ref GfxFont GBool PDFDoc GString LinkDest Stream
But in fact there may be more - and you may also have some wishes about PDF parsing and manipulation that need to be added yet.
If we would start pdfTeX now, we would probably use much more (Taco, are you listening? :-). E.g. pdfTeX has a complete machinery for parsing font files, while poppler probably also has code for that. And the code for writing objects is also duplicated and quite low-level in pdfTeX. So there is definitely duplicate functionality. A very short term goal (i.e. 1.40) would be to have code in utils.c:initversionstring to handle the case where poppler is used instead of xpdf. And to have the patches you already distribute for using poppler in pdfTeX. Best Martin
Martin Schröder wrote:
But in fact there may be more - and you may also have some wishes about PDF parsing and manipulation that need to be added yet.
If we would start pdfTeX now, we would probably use much more (Taco, are you listening? :-). E.g. pdfTeX has a complete machinery for parsing font files, while poppler probably also has code for that. And the code for writing objects is also duplicated and quite low-level in pdfTeX. So there is definitely duplicate functionality.
If poppler is just using fontconfig + freetype2 for local font discovery for to-be-opened pdf docs (this is my current understanding), then that would be only about as useful for pdfTeX as that same code in xetex already is. And that is not very much, I am afraid. OTOH, if poppler is not only a pdf rendering library but also a pdf creation library (or aspires to become that, then it is quite likely that there will be some overlap that could possibly be merged one way or the other. I do not know enough about poppler (or about pdf in general, for that matter) to say something definate about this, sorry. I never have had to anything with the pdf image inclusion, so I am not a suited person to respond about that part, at all. Best, Taco
Taco Hoekwater
I do not know enough about poppler (or about pdf in general, for that matter) to say something definate about this, sorry.
Me neither, except that I know that very basically and non-technically poppler is "xpdf as a shared library, plus better displaying capabilities". Regards, Frank -- Dr. Frank Küster Single Molecule Spectroscopy, Protein Folding @ Inst. f. Biochemie, Univ. Zürich Debian Developer (teTeX/TeXLive)
Frank � wrote:
Taco Hoekwater
wrote: I do not know enough about poppler (or about pdf in general, for that matter) to say something definate about this, sorry.
Me neither, except that I know that very basically and non-technically poppler is "xpdf as a shared library, plus better displaying capabilities".
i know enogh of pdf and needs to comment on an api proposal once it's there, but i think that thank/martin/hartmut need to cook up the first specs Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (5)
-
Disgusted of Tunbridge Wells
-
Frank Küster
-
Hans Hagen
-
Martin Schröder
-
Taco Hoekwater