On Fri, Feb 20, 2009 at 4:09 PM, Thomas A. Schmitz
On Feb 19, 2009, at 3:10 PM, luigi scarso wrote:
Luigi,
thanks so much for your patient replies. I have now begun to play with python's lxml. It offers a lot, maybe too much for a beginner. One advantage for my immediate needs that I see is that it offers the possibility to use Python's regular expressions and control structures, so this may make coding easier to maintain and adapt that in the rather clumsy xslt syntax; it may be a big help for the rather messy OpenOffice xml that I want to process.
also Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
URI_OFFICE = "urn:oasis:names:tc:opendocument:xmlns:office:1.0" URI_STYLE = "urn:oasis:names:tc:opendocument:xmlns:style:1.0" URI_TEXT = "urn:oasis:names:tc:opendocument:xmlns:text:1.0" URI_TABLE = "urn:oasis:names:tc:opendocument:xmlns:table:1.0" URI_DRAW = "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" URI_FO = "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" URI_XLINK = "http://www.w3.org/1999/xlink" URI_DC = "http://purl.org/dc/elements/1.1/" URI_META = "urn:oasis:names:tc:opendocument:xmlns:meta:1.0" URI_NUMBER = "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" URI_PRESENTATION = "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" URI_SVG = "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" URI_CHART = "urn:oasis:names:tc:opendocument:xmlns:chart:1.0" URI_DR3D = "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" URI_MATH = "http://www.w3.org/1998/Math/MathML" URI_FORM = "urn:oasis:names:tc:opendocument:xmlns:form:1.0" URI_SCRIPT = "urn:oasis:names:tc:opendocument:xmlns:script:1.0" URI_OOO = "http://openoffice.org/2004/office" URI_OOOW = "http://openoffice.org/2004/writer" URI_OOOC = "http://openoffice.org/2004/calc" URI_DOM = "http://www.w3.org/2001/xml-events" URI_XFORMS = "http://www.w3.org/2002/xforms" URI_XSD = "http://www.w3.org/2001/XMLSchema" URI_XSI = "http://www.w3.org/2001/XMLSchema-instance" URI_FIELD = "urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0"
NSMAP_OO = { "office" : URI_OFFICE, "style" : URI_STYLE, "text" : URI_TEXT, "table" : URI_TABLE, "draw" : URI_DRAW, "fo" : URI_FO, "xlink" : URI_XLINK, "dc" : URI_DC, "meta" : URI_META, "number" : URI_NUMBER, "presentation" : URI_PRESENTATION, "svg" : URI_SVG, "chart" : URI_CHART, "dr3d" : URI_DR3D, "math" : URI_MATH, "form" : URI_FORM, "script" : URI_SCRIPT, "ooo" : URI_OOO, "ooow" : URI_OOOW, "oooc" : URI_OOOC, "dom" : URI_DOM, "xforms" : URI_XFORMS, "xsd" : URI_XSD, "xsi" : URI_XSI, "field" : URI_FIELD, }
from lxml import etree
tree = etree.parse(file('t.xml'))
foo = tree.getroot()
[child.tag for child in foo.iterdescendants(tag = '{%s}span'%URI_TEXT ) ] ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
give a look at http://opendocumentfellowship.com/projects/odfpy too -- luigi