[18214] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> |
---|
| 2 | <html> |
---|
| 3 | <head> |
---|
| 4 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
---|
| 5 | <style type="text/css"><!-- |
---|
| 6 | TD {font-family: Verdana,Arial,Helvetica} |
---|
| 7 | BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} |
---|
| 8 | H1 {font-family: Verdana,Arial,Helvetica} |
---|
| 9 | H2 {font-family: Verdana,Arial,Helvetica} |
---|
| 10 | H3 {font-family: Verdana,Arial,Helvetica} |
---|
| 11 | A:link, A:visited, A:active { text-decoration: underline } |
---|
| 12 | --></style> |
---|
| 13 | <title>Library internals</title> |
---|
| 14 | </head> |
---|
| 15 | <body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> |
---|
| 16 | <table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> |
---|
| 17 | <td width="100"> |
---|
| 18 | <a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo"></a><a href="http://www.redhat.com"><img src="redhat.gif" alt="Red Hat Logo"></a><div align="left"><a href="http://xmlsoft.org/XSLT/"><img src="Libxslt-Logo-180x168.gif" alt="Made with Libxslt Logo"></a></div> |
---|
| 19 | </td> |
---|
| 20 | <td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> |
---|
| 21 | <h1>The XSLT C library for Gnome</h1> |
---|
| 22 | <h2>Library internals</h2> |
---|
| 23 | </td></tr></table></td></tr></table></td> |
---|
| 24 | </tr></table> |
---|
| 25 | <table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> |
---|
| 26 | <td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> |
---|
| 27 | <table width="100%" border="0" cellspacing="1" cellpadding="3"> |
---|
| 28 | <tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> |
---|
[18542] | 29 | <tr><td bgcolor="#fffacd"> |
---|
| 30 | <form action="search.php" enctype="application/x-www-form-urlencoded" method="GET"> |
---|
| 31 | <input name="query" type="TEXT" size="20" value=""><input name="submit" type="submit" value="Search ..."> |
---|
| 32 | </form> |
---|
| 33 | <ul> |
---|
[18214] | 34 | <li><a href="index.html">Home</a></li> |
---|
| 35 | <li><a href="intro.html">Introduction</a></li> |
---|
| 36 | <li><a href="docs.html">Documentation</a></li> |
---|
| 37 | <li><a href="bugs.html">Reporting bugs and getting help</a></li> |
---|
| 38 | <li><a href="help.html">How to help</a></li> |
---|
| 39 | <li><a href="downloads.html">Downloads</a></li> |
---|
| 40 | <li><a href="FAQ.html">FAQ</a></li> |
---|
| 41 | <li><a href="news.html">News</a></li> |
---|
| 42 | <li><a href="xsltproc2.html">The xsltproc tool</a></li> |
---|
| 43 | <li><a href="docbook.html">DocBook</a></li> |
---|
| 44 | <li><a href="API.html">The programming API</a></li> |
---|
| 45 | <li><a href="python.html">Python and bindings</a></li> |
---|
| 46 | <li><a href="internals.html">Library internals</a></li> |
---|
| 47 | <li><a href="extensions.html">Writing extensions</a></li> |
---|
| 48 | <li><a href="contribs.html">Contributions</a></li> |
---|
| 49 | <li> |
---|
| 50 | <a href="xslt.html">flat page</a>, <a href="site.xsl">stylesheet</a> |
---|
| 51 | </li> |
---|
| 52 | </ul> |
---|
| 53 | </td></tr> |
---|
| 54 | </table> |
---|
| 55 | <table width="100%" border="0" cellspacing="1" cellpadding="3"> |
---|
| 56 | <tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> |
---|
| 57 | <tr><td bgcolor="#fffacd"><ul> |
---|
| 58 | <li><a href="tutorial/libxslttutorial.html">Tutorial</a></li> |
---|
| 59 | <li><a href="xsltproc.html">Man page for xsltproc</a></li> |
---|
| 60 | <li><a href="http://mail.gnome.org/archives/xslt/">Mail archive</a></li> |
---|
| 61 | <li><a href="http://xmlsoft.org/">XML libxml</a></li> |
---|
| 62 | <li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li> |
---|
| 63 | <li><a href="ftp://xmlsoft.org/">FTP</a></li> |
---|
[18542] | 64 | <li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li> |
---|
[18214] | 65 | <li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li> |
---|
| 66 | <li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li> |
---|
| 67 | <li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li> |
---|
| 68 | <li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxslt">Bug Tracker</a></li> |
---|
| 69 | <li><a href="http://xsldbg.sourceforge.net/">Xsldbg Debugger</a></li> |
---|
| 70 | </ul></td></tr> |
---|
| 71 | </table> |
---|
[18542] | 72 | <table width="100%" border="0" cellspacing="1" cellpadding="3"> |
---|
| 73 | <tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr> |
---|
| 74 | <tr><td bgcolor="#fffacd"><ul> |
---|
| 75 | <li><a href="APIchunk0.html">Alphabetic</a></li> |
---|
| 76 | <li><a href="APIconstructors.html">Constructors</a></li> |
---|
| 77 | <li><a href="APIfunctions.html">Functions/Types</a></li> |
---|
| 78 | <li><a href="APIfiles.html">Modules</a></li> |
---|
| 79 | <li><a href="APIsymbols.html">Symbols</a></li> |
---|
| 80 | </ul></td></tr> |
---|
| 81 | </table> |
---|
[18214] | 82 | </td></tr></table></td> |
---|
| 83 | <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> |
---|
| 84 | <h3>Table of contents</h3> |
---|
| 85 | <ul> |
---|
| 86 | <li><a href="internals.html#Introducti">Introduction</a></li> |
---|
| 87 | <li><a href="internals.html#Basics">Basics</a></li> |
---|
| 88 | <li><a href="internals.html#Keep">Keep it simple stupid</a></li> |
---|
| 89 | <li><a href="internals.html#libxml">The libxml nodes</a></li> |
---|
| 90 | <li><a href="internals.html#XSLT">The XSLT processing steps</a></li> |
---|
| 91 | <li><a href="internals.html#XSLT1">The XSLT stylesheet compilation</a></li> |
---|
| 92 | <li><a href="internals.html#XSLT2">The XSLT template compilation</a></li> |
---|
| 93 | <li><a href="internals.html#processing">The processing itself</a></li> |
---|
| 94 | <li><a href="internals.html#XPath">XPath expressions compilation</a></li> |
---|
| 95 | <li><a href="internals.html#XPath1">XPath interpretation</a></li> |
---|
| 96 | <li><a href="internals.html#Descriptio">Description of XPath |
---|
| 97 | Objects</a></li> |
---|
| 98 | <li><a href="internals.html#XPath3">XPath functions</a></li> |
---|
| 99 | <li><a href="internals.html#stack">The variables stack frame</a></li> |
---|
| 100 | <li><a href="internals.html#Extension">Extension support</a></li> |
---|
| 101 | <li><a href="internals.html#Futher">Further reading</a></li> |
---|
| 102 | <li><a href="internals.html#TODOs">TODOs</a></li> |
---|
| 103 | </ul> |
---|
| 104 | <h3><a name="Introducti2">Introduction</a></h3> |
---|
| 105 | <p>This document describes the processing of <a href="http://xmlsoft.org/XSLT/">libxslt</a>, the <a href="http://www.w3.org/TR/xslt">XSLT</a> C library developed for the <a href="http://www.gnome.org/">Gnome</a> project.</p> |
---|
| 106 | <p>Note: this documentation is by definition incomplete and I am not good at |
---|
| 107 | spelling, grammar, so patches and suggestions are <a href="mailto:veillard@redhat.com">really welcome</a>.</p> |
---|
| 108 | <h3><a name="Basics1">Basics</a></h3> |
---|
| 109 | <p>XSLT is a transformation language. It takes an input document and a |
---|
| 110 | stylesheet document and generates an output document:</p> |
---|
| 111 | <p align="center"><img src="processing.gif" alt="the XSLT processing model"></p> |
---|
| 112 | <p>Libxslt is written in C. It relies on <a href="http://www.xmlsoft.org/">libxml</a>, the XML C library for Gnome, for |
---|
| 113 | the following operations:</p> |
---|
| 114 | <ul> |
---|
| 115 | <li>parsing files</li> |
---|
| 116 | <li>building the in-memory DOM structure associated with the documents |
---|
| 117 | handled</li> |
---|
| 118 | <li>the XPath implementation</li> |
---|
| 119 | <li>serializing back the result document to XML and HTML. (Text is handled |
---|
| 120 | directly.)</li> |
---|
| 121 | </ul> |
---|
| 122 | <h3><a name="Keep1">Keep it simple stupid</a></h3> |
---|
| 123 | <p>Libxslt is not very specialized. It is built under the assumption that all |
---|
| 124 | nodes from the source and output document can fit in the virtual memory of |
---|
| 125 | the system. There is a big trade-off there. It is fine for reasonably sized |
---|
| 126 | documents but may not be suitable for large sets of data. The gain is that it |
---|
| 127 | can be used in a relatively versatile way. The input or output may never be |
---|
| 128 | serialized, but the size of documents it can handle are limited by the size |
---|
| 129 | of the memory available.</p> |
---|
| 130 | <p>More specialized memory handling approaches are possible, like building |
---|
| 131 | the input tree from a serialization progressively as it is consumed, |
---|
| 132 | factoring repetitive patterns, or even on-the-fly generation of the output as |
---|
| 133 | the input is parsed but it is possible only for a limited subset of the |
---|
| 134 | stylesheets. In general the implementation of libxslt follows the following |
---|
| 135 | pattern:</p> |
---|
| 136 | <ul> |
---|
| 137 | <li>KISS (keep it simple stupid)</li> |
---|
| 138 | <li>when there is a clear bottleneck optimize on top of this simple |
---|
| 139 | framework and refine only as much as is needed to reach the expected |
---|
| 140 | result</li> |
---|
| 141 | </ul> |
---|
| 142 | <p>The result is not that bad, clearly one can do a better job but more |
---|
| 143 | specialized too. Most optimization like building the tree on-demand would |
---|
| 144 | need serious changes to the libxml XPath framework. An easy step would be to |
---|
| 145 | serialize the output directly (or call a set of SAX-like output handler to |
---|
| 146 | keep this a flexible interface) and hence avoid the memory consumption of the |
---|
| 147 | result.</p> |
---|
| 148 | <h3><a name="libxml">The libxml nodes</a></h3> |
---|
| 149 | <p>DOM-like trees, as used and generated by libxml and libxslt, are |
---|
| 150 | relatively complex. Most node types follow the given structure except a few |
---|
| 151 | variations depending on the node type:</p> |
---|
| 152 | <p align="center"><img src="node.gif" alt="description of a libxml node"></p> |
---|
| 153 | <p>Nodes carry a <strong>name</strong> and the node <strong>type</strong> |
---|
| 154 | indicates the kind of node it represents, the most common ones are:</p> |
---|
| 155 | <ul> |
---|
| 156 | <li>document nodes</li> |
---|
| 157 | <li>element nodes</li> |
---|
| 158 | <li>text nodes</li> |
---|
| 159 | </ul> |
---|
| 160 | <p>For the XSLT processing, entity nodes should not be generated (i.e. they |
---|
| 161 | should be replaced by their content). Most nodes also contains the following |
---|
| 162 | "navigation" informations:</p> |
---|
| 163 | <ul> |
---|
| 164 | <li>the containing <strong>doc</strong>ument</li> |
---|
| 165 | <li>the <strong>parent</strong> node</li> |
---|
| 166 | <li>the first <strong>children</strong> node</li> |
---|
| 167 | <li>the <strong>last</strong> children node</li> |
---|
| 168 | <li>the <strong>prev</strong>ious sibling</li> |
---|
| 169 | <li>the following sibling (<strong>next</strong>)</li> |
---|
| 170 | </ul> |
---|
| 171 | <p>Elements nodes carries the list of attributes in the properties, an |
---|
| 172 | attribute itself holds the navigation pointers and the children list (the |
---|
| 173 | attribute value is not represented as a simple string to allow usage of |
---|
| 174 | entities references).</p> |
---|
| 175 | <p>The <strong>ns</strong> points to the namespace declaration for the |
---|
| 176 | namespace associated to the node, <strong>nsDef</strong> is the linked list |
---|
| 177 | of namespace declaration present on element nodes.</p> |
---|
| 178 | <p>Most nodes also carry an <strong>_private</strong> pointer which can be |
---|
| 179 | used by the application to hold specific data on this node.</p> |
---|
| 180 | <h3><a name="XSLT">The XSLT processing steps</a></h3> |
---|
| 181 | <p>There are a few steps which are clearly decoupled at the interface |
---|
| 182 | level:</p> |
---|
| 183 | <ol> |
---|
| 184 | <li>parse the stylesheet and generate a DOM tree</li> |
---|
| 185 | <li>take the stylesheet tree and build a compiled version of it (the |
---|
| 186 | compilation phase)</li> |
---|
| 187 | <li>take the input and generate a DOM tree</li> |
---|
| 188 | <li>process the stylesheet against the input tree and generate an output |
---|
| 189 | tree</li> |
---|
| 190 | <li>serialize the output tree</li> |
---|
| 191 | </ol> |
---|
| 192 | <p>A few things should be noted here:</p> |
---|
| 193 | <ul> |
---|
| 194 | <li>the steps 1/ 3/ and 5/ are optional</li> |
---|
| 195 | <li>the stylesheet obtained at 2/ can be reused by multiple processing 4/ |
---|
| 196 | (and this should also work in threaded programs)</li> |
---|
| 197 | <li>the tree provided in 2/ should never be freed using xmlFreeDoc, but by |
---|
| 198 | freeing the stylesheet.</li> |
---|
| 199 | <li>the input tree 4/ is not modified except the _private field which may |
---|
| 200 | be used for labelling keys if used by the stylesheet</li> |
---|
| 201 | </ul> |
---|
| 202 | <h3><a name="XSLT1">The XSLT stylesheet compilation</a></h3> |
---|
| 203 | <p>This is the second step described. It takes a stylesheet tree, and |
---|
| 204 | "compiles" it. This associates to each node a structure stored in the |
---|
| 205 | _private field and containing information computed in the stylesheet:</p> |
---|
| 206 | <p align="center"><img src="stylesheet.gif" alt="a compiled XSLT stylesheet"></p> |
---|
| 207 | <p>One xsltStylesheet structure is generated per document parsed for the |
---|
| 208 | stylesheet. XSLT documents allow includes and imports of other documents, |
---|
| 209 | imports are stored in the <strong>imports</strong> list (hence keeping the |
---|
| 210 | tree hierarchy of includes which is very important for a proper XSLT |
---|
| 211 | processing model) and includes are stored in the <strong>doclist</strong> |
---|
| 212 | list. An imported stylesheet has a parent link to allow browsing of the |
---|
| 213 | tree.</p> |
---|
| 214 | <p>The DOM tree associated to the document is stored in <strong>doc</strong>. |
---|
| 215 | It is preprocessed to remove ignorable empty nodes and all the nodes in the |
---|
| 216 | XSLT namespace are subject to precomputing. This usually consist of |
---|
| 217 | extracting all the context information from the context tree (attributes, |
---|
| 218 | namespaces, XPath expressions), and storing them in an xsltStylePreComp |
---|
| 219 | structure associated to the <strong>_private</strong> field of the node.</p> |
---|
| 220 | <p>A couple of notable exceptions to this are XSLT template nodes (more on |
---|
| 221 | this later) and attribute value templates. If they are actually templates, |
---|
| 222 | the value cannot be computed at compilation time. (Some preprocessing could |
---|
| 223 | be done like isolation and preparsing of the XPath subexpressions but it's |
---|
| 224 | not done, yet.)</p> |
---|
| 225 | <p>The xsltStylePreComp structure also allows storing of the precompiled form |
---|
| 226 | of an XPath expression that can be associated to an XSLT element (more on |
---|
| 227 | this later).</p> |
---|
| 228 | <h3><a name="XSLT2">The XSLT template compilation</a></h3> |
---|
| 229 | <p>A proper handling of templates lookup is one of the keys of fast XSLT |
---|
| 230 | processing. (Given a node in the source document this is the process of |
---|
| 231 | finding which templates should be applied to this node.) Libxslt follows the |
---|
| 232 | hint suggested in the <a href="http://www.w3.org/TR/xslt#patterns">5.2 |
---|
| 233 | Patterns</a> section of the XSLT Recommendation, i.e. it doesn't evaluate it |
---|
| 234 | as an XPath expression but tokenizes it and compiles it as a set of rules to |
---|
| 235 | be evaluated on a candidate node. There usually is an indication of the node |
---|
| 236 | name in the last step of this evaluation and this is used as a key check for |
---|
| 237 | the match. As a result libxslt builds a relatively more complex set of |
---|
| 238 | structures for the templates:</p> |
---|
| 239 | <p align="center"><img src="templates.gif" alt="The templates related structure"></p> |
---|
| 240 | <p>Let's describe a bit more closely what is built. First the xsltStylesheet |
---|
| 241 | structure holds a pointer to the template hash table. All the XSLT patterns |
---|
| 242 | compiled in this stylesheet are indexed by the value of the the target |
---|
| 243 | element (or attribute, pi ...) name, so when a element or an attribute "foo" |
---|
| 244 | needs to be processed the lookup is done using the name as a key.</p> |
---|
| 245 | <p>Each of the patterns is compiled into an xsltCompMatch structure. It holds |
---|
| 246 | the set of rules based on the tokenization of the pattern stored in reverse |
---|
| 247 | order (matching is easier this way). It also holds some information about the |
---|
| 248 | previous matches used to speed up the process when one iterates over a set of |
---|
| 249 | siblings. (This optimization may be defeated by trashing when running |
---|
| 250 | threaded computation, it's unclear that this is a big deal in practice.) |
---|
| 251 | Predicate expressions are not compiled at this stage, they may be at run-time |
---|
| 252 | if needed, but in this case they are compiled as full XPath expressions (the |
---|
| 253 | use of some fixed predicate can probably be optimized, they are not yet).</p> |
---|
| 254 | <p>The xsltCompMatch are then stored in the hash table, the clash list is |
---|
| 255 | itself sorted by priority of the template to implement "naturally" the XSLT |
---|
| 256 | priority rules.</p> |
---|
| 257 | <p>Associated to the compiled pattern is the xsltTemplate itself containing |
---|
| 258 | the information required for the processing of the pattern including, of |
---|
| 259 | course, a pointer to the list of elements used for building the pattern |
---|
| 260 | result.</p> |
---|
| 261 | <p>Last but not least a number of patterns do not fit in the hash table |
---|
| 262 | because they are not associated to a name, this is the case for patterns |
---|
| 263 | applying to the root, any element, any attributes, text nodes, pi nodes, keys |
---|
| 264 | etc. Those are stored independently in the stylesheet structure as separate |
---|
| 265 | linked lists of xsltCompMatch.</p> |
---|
| 266 | <h3><a name="processing">The processing itself</a></h3> |
---|
| 267 | <p>The processing is defined by the XSLT specification (the basis of the |
---|
| 268 | algorithm is explained in <a href="http://www.w3.org/TR/xslt#section-Introduction">the Introduction</a> |
---|
| 269 | section). Basically it works by taking the root of the input document and |
---|
| 270 | applying the following algorithm:</p> |
---|
| 271 | <ol> |
---|
| 272 | <li>Finding the template applying to it. This is a lookup in the template |
---|
| 273 | hash table, walking the hash list until the node satisfies all the steps |
---|
| 274 | of the pattern, then checking the appropriate(s) global templates to see |
---|
| 275 | if there isn't a higher priority rule to apply</li> |
---|
| 276 | <li>If there is no template, apply the default rule (recurse on the |
---|
| 277 | children)</li> |
---|
| 278 | <li>else walk the content list of the selected templates, for each of them: |
---|
| 279 | <ul> |
---|
| 280 | <li>if the node is in the XSLT namespace then the node has a _private |
---|
| 281 | field pointing to the preprocessed values, jump to the specific |
---|
| 282 | code</li> |
---|
| 283 | <li>if the node is in an extension namespace, look up the associated |
---|
| 284 | behavior</li> |
---|
| 285 | <li>otherwise copy the node.</li> |
---|
| 286 | </ul> |
---|
| 287 | <p>The closure is usually done through the XSLT |
---|
| 288 | <strong>apply-templates</strong> construct recursing by applying the |
---|
| 289 | adequate template on the input node children or on the result of an |
---|
| 290 | associated XPath selection lookup.</p> |
---|
| 291 | </li> |
---|
| 292 | </ol> |
---|
| 293 | <p>Note that large parts of the input tree may not be processed by a given |
---|
| 294 | stylesheet and that on the opposite some may be processed multiple times. |
---|
| 295 | (This often is the case when a Table of Contents is built).</p> |
---|
| 296 | <p>The module <code>transform.c</code> is the one implementing most of this |
---|
| 297 | logic. <strong>xsltApplyStylesheet()</strong> is the entry point, it |
---|
| 298 | allocates an xsltTransformContext containing the following:</p> |
---|
| 299 | <ul> |
---|
| 300 | <li>a pointer to the stylesheet being processed</li> |
---|
| 301 | <li>a stack of templates</li> |
---|
| 302 | <li>a stack of variables and parameters</li> |
---|
| 303 | <li>an XPath context</li> |
---|
| 304 | <li>the template mode</li> |
---|
| 305 | <li>current document</li> |
---|
| 306 | <li>current input node</li> |
---|
| 307 | <li>current selected node list</li> |
---|
| 308 | <li>the current insertion points in the output document</li> |
---|
| 309 | <li>a couple of hash tables for extension elements and functions</li> |
---|
| 310 | </ul> |
---|
| 311 | <p>Then a new document gets allocated (HTML or XML depending on the type of |
---|
| 312 | output), the user parameters and global variables and parameters are |
---|
| 313 | evaluated. Then <strong>xsltProcessOneNode()</strong> which implements the |
---|
| 314 | 1-2-3 algorithm is called on the root element of the input. Step 1/ is |
---|
| 315 | implemented by calling <strong>xsltGetTemplate()</strong>, step 2/ is |
---|
| 316 | implemented by <strong>xsltDefaultProcessOneNode()</strong> and step 3/ is |
---|
| 317 | implemented by <strong>xsltApplyOneTemplate()</strong>.</p> |
---|
| 318 | <h3><a name="XPath">XPath expression compilation</a></h3> |
---|
| 319 | <p>The XPath support is actually implemented in the libxml module (where it |
---|
| 320 | is reused by the XPointer implementation). XPath is a relatively classic |
---|
| 321 | expression language. The only uncommon feature is that it is working on XML |
---|
| 322 | trees and hence has specific syntax and types to handle them.</p> |
---|
| 323 | <p>XPath expressions are compiled using <strong>xmlXPathCompile()</strong>. |
---|
| 324 | It will take an expression string in input and generate a structure |
---|
| 325 | containing the parsed expression tree, for example the expression:</p> |
---|
| 326 | <pre>/doc/chapter[title='Introduction']</pre> |
---|
| 327 | <p>will be compiled as</p> |
---|
| 328 | <pre>Compiled Expression : 10 elements |
---|
| 329 | SORT |
---|
| 330 | COLLECT 'child' 'name' 'node' chapter |
---|
| 331 | COLLECT 'child' 'name' 'node' doc |
---|
| 332 | ROOT |
---|
| 333 | PREDICATE |
---|
| 334 | SORT |
---|
| 335 | EQUAL = |
---|
| 336 | COLLECT 'child' 'name' 'node' title |
---|
| 337 | NODE |
---|
| 338 | ELEM Object is a string : Introduction |
---|
| 339 | COLLECT 'child' 'name' 'node' title |
---|
| 340 | NODE</pre> |
---|
| 341 | <p>This can be tested using the <code>testXPath</code> command (in the |
---|
| 342 | libxml codebase) using the <code>--tree</code> option.</p> |
---|
| 343 | <p>Again, the KISS approach is used. No optimization is done. This could be |
---|
| 344 | an interesting thing to add. <a href="http://www-106.ibm.com/developerworks/library/x-xslt2/?dwzone=x?open&l=132%2ct=gr%2c%2Bp=saxon">Michael |
---|
| 345 | Kay describes</a> a lot of possible and interesting optimizations done in |
---|
| 346 | Saxon which would be possible at this level. I'm unsure they would provide |
---|
| 347 | much gain since the expressions tends to be relatively simple in general and |
---|
| 348 | stylesheets are still hand generated. Optimizations at the interpretation |
---|
| 349 | sounds likely to be more efficient.</p> |
---|
| 350 | <h3><a name="XPath1">XPath interpretation</a></h3> |
---|
| 351 | <p>The interpreter is implemented by <strong>xmlXPathCompiledEval()</strong> |
---|
| 352 | which is the front-end to <strong>xmlXPathCompOpEval()</strong> the function |
---|
| 353 | implementing the evaluation of the expression tree. This evaluation follows |
---|
| 354 | the KISS approach again. It's recursive and calls |
---|
| 355 | <strong>xmlXPathNodeCollectAndTest()</strong> to collect nodes set when |
---|
| 356 | evaluating a <code>COLLECT</code> node.</p> |
---|
| 357 | <p>An evaluation is done within the framework of an XPath context stored in |
---|
| 358 | an <strong>xmlXPathContext</strong> structure, in the framework of a |
---|
| 359 | transformation the context is maintained within the XSLT context. Its content |
---|
| 360 | follows the requirements from the XPath specification:</p> |
---|
| 361 | <ul> |
---|
| 362 | <li>the current document</li> |
---|
| 363 | <li>the current node</li> |
---|
| 364 | <li>a hash table of defined variables (but not used by XSLT)</li> |
---|
| 365 | <li>a hash table of defined functions</li> |
---|
| 366 | <li>the proximity position (the place of the node in the current node |
---|
| 367 | list)</li> |
---|
| 368 | <li>the context size (the size of the current node list)</li> |
---|
| 369 | <li>the array of namespace declarations in scope (there also is a namespace |
---|
| 370 | hash table but it is not used in the XSLT transformation).</li> |
---|
| 371 | </ul> |
---|
| 372 | <p>For the purpose of XSLT an <strong>extra</strong> pointer has been added |
---|
| 373 | allowing to retrieve the XSLT transformation context. When an XPath |
---|
| 374 | evaluation is about to be performed, an XPath parser context is allocated |
---|
| 375 | containing and XPath object stack (this is actually an XPath evaluation |
---|
| 376 | context, this is a remain of the time where there was no separate parsing and |
---|
| 377 | evaluation phase in the XPath implementation). Here is an overview of the set |
---|
| 378 | of contexts associated to an XPath evaluation within an XSLT |
---|
| 379 | transformation:</p> |
---|
| 380 | <p align="center"><img src="contexts.gif" alt="The set of contexts associated "></p> |
---|
| 381 | <p>Clearly this is a bit too complex and confusing and should be refactored |
---|
| 382 | at the next set of binary incompatible releases of libxml. For example the |
---|
| 383 | xmlXPathCtxt has a lot of unused parts and should probably be merged with |
---|
| 384 | xmlXPathParserCtxt.</p> |
---|
| 385 | <h3><a name="Descriptio">Description of XPath Objects</a></h3> |
---|
| 386 | <p>An XPath expression manipulates XPath objects. XPath defines the default |
---|
| 387 | types boolean, numbers, strings and node sets. XSLT adds the result tree |
---|
| 388 | fragment type which is basically an unmodifiable node set.</p> |
---|
| 389 | <p>Implementation-wise, libxml follows again a KISS approach, the |
---|
| 390 | xmlXPathObject is a structure containing a type description and the various |
---|
| 391 | possibilities. (Using an enum could have gained some bytes.) In the case of |
---|
| 392 | node sets (or result tree fragments), it points to a separate xmlNodeSet |
---|
| 393 | object which contains the list of pointers to the document nodes:</p> |
---|
| 394 | <p align="center"><img src="object.gif" alt="An Node set object pointing to "></p> |
---|
| 395 | <p>The <a href="http://xmlsoft.org/html/libxml-xpath.html">XPath API</a> (and |
---|
| 396 | its <a href="http://xmlsoft.org/html/libxml-xpathinternals.html">'internal' |
---|
| 397 | part</a>) includes a number of functions to create, copy, compare, convert or |
---|
| 398 | free XPath objects.</p> |
---|
| 399 | <h3><a name="XPath3">XPath functions</a></h3> |
---|
| 400 | <p>All the XPath functions available to the interpreter are registered in the |
---|
| 401 | function hash table linked from the XPath context. They all share the same |
---|
| 402 | signature:</p> |
---|
| 403 | <pre>void xmlXPathFunc (xmlXPathParserContextPtr ctxt, int nargs);</pre> |
---|
| 404 | <p>The first argument is the XPath interpretation context, holding the |
---|
| 405 | interpretation stack. The second argument defines the number of objects |
---|
| 406 | passed on the stack for the function to consume (last argument is on top of |
---|
| 407 | the stack).</p> |
---|
| 408 | <p>Basically an XPath function does the following:</p> |
---|
| 409 | <ul> |
---|
| 410 | <li>check <code>nargs</code> for proper handling of errors or functions |
---|
| 411 | with variable numbers of parameters</li> |
---|
| 412 | <li>pop the parameters from the stack using <code>obj = |
---|
| 413 | valuePop(ctxt);</code> |
---|
| 414 | </li> |
---|
| 415 | <li>do the function specific computation</li> |
---|
| 416 | <li>push the result parameter on the stack using <code>valuePush(ctxt, |
---|
| 417 | res);</code> |
---|
| 418 | </li> |
---|
| 419 | <li>free up the input parameters with |
---|
| 420 | <code>xmlXPathFreeObject(obj);</code> |
---|
| 421 | </li> |
---|
| 422 | <li>return</li> |
---|
| 423 | </ul> |
---|
| 424 | <p>Sometime the work can be done directly by modifying in-situ the top object |
---|
| 425 | on the stack <code>ctxt->value</code>.</p> |
---|
| 426 | <h3><a name="stack">The XSLT variables stack frame</a></h3> |
---|
| 427 | <p>Not to be confused with XPath object stack, this stack holds the XSLT |
---|
| 428 | variables and parameters as they are defined through the recursive calls of |
---|
| 429 | call-template, apply-templates and default templates. This is used to define |
---|
| 430 | the scope of variables being called.</p> |
---|
| 431 | <p>This part seems to be the most urgent attention right now, first it is |
---|
| 432 | done in a very inefficient way since the location of the variables and |
---|
| 433 | parameters within the stylesheet tree is still done at run time (it really |
---|
| 434 | should be done statically at compile time), and I am still unsure that my |
---|
| 435 | understanding of the template variables and parameter scope is actually |
---|
| 436 | right.</p> |
---|
| 437 | <p>This part of the documentation is still to be written once this part of |
---|
| 438 | the code will be stable. <span style="background-color: #FF0000">TODO</span> |
---|
| 439 | </p> |
---|
| 440 | <h3><a name="Extension">Extension support</a></h3> |
---|
| 441 | <p>There is a separate document explaining <a href="extensions.html">how the |
---|
| 442 | extension support works</a>.</p> |
---|
| 443 | <h3><a name="Futher">Further reading</a></h3> |
---|
| 444 | <p>Michael Kay wrote <a href="http://www-106.ibm.com/developerworks/library/x-xslt2/?dwzone=x?open&l=132%2ct=gr%2c%2Bp=saxon">a |
---|
| 445 | really interesting article on Saxon internals</a> and the work he did on |
---|
| 446 | performance issues. I wishes I had read it before starting libxslt design (I |
---|
| 447 | would probably have avoided a few mistakes and progressed faster). A lot of |
---|
| 448 | the ideas in his papers should be implemented or at least tried in |
---|
| 449 | libxslt.</p> |
---|
| 450 | <p>The <a href="http://xmlsoft.org/">libxml documentation</a>, especially <a href="http://xmlsoft.org/xmlio.html">the I/O interfaces</a> and the <a href="http://xmlsoft.org/xmlmem.html">memory management</a>.</p> |
---|
| 451 | <h3><a name="TODOs">TODOs</a></h3> |
---|
| 452 | <p>redesign the XSLT stack frame handling. Far too much work is done at |
---|
| 453 | execution time. Similarly for the attribute value templates handling, at |
---|
| 454 | least the embedded subexpressions ought to be precompiled.</p> |
---|
| 455 | <p>Allow output to be saved to a SAX like output (this notion of SAX like API |
---|
| 456 | for output should be added directly to libxml).</p> |
---|
| 457 | <p>Implement and test some of the optimization explained by Michael Kay |
---|
| 458 | especially:</p> |
---|
| 459 | <ul> |
---|
| 460 | <li>static slot allocation on the stack frame</li> |
---|
| 461 | <li>specific boolean interpretation of an XPath expression</li> |
---|
| 462 | <li>some of the sorting optimization</li> |
---|
| 463 | <li>Lazy evaluation of location path. (this may require more changes but |
---|
| 464 | sounds really interesting. XT does this too.)</li> |
---|
| 465 | <li>Optimization of an expression tree (This could be done as a completely |
---|
| 466 | independent module.)</li> |
---|
| 467 | </ul> |
---|
[18542] | 468 | <p></p> |
---|
[18214] | 469 | <p>Error reporting, there is a lot of case where the XSLT specification |
---|
| 470 | specify that a given construct is an error are not checked adequately by |
---|
| 471 | libxslt. Basically one should do a complete pass on the XSLT spec again and |
---|
| 472 | add all tests to the stylesheet compilation. Using the DTD provided in the |
---|
| 473 | appendix and making direct checks using the libxml validation API sounds a |
---|
| 474 | good idea too (though one should take care of not raising errors for |
---|
| 475 | elements/attributes in different namespaces).</p> |
---|
| 476 | <p>Double check all the places where the stylesheet compiled form might be |
---|
| 477 | modified at run time (extra removal of blanks nodes, hint on the |
---|
| 478 | xsltCompMatch).</p> |
---|
[18542] | 479 | <p></p> |
---|
[18214] | 480 | <p><a href="bugs.html">Daniel Veillard</a></p> |
---|
| 481 | </td></tr></table></td></tr></table></td></tr></table></td> |
---|
| 482 | </tr></table></td></tr></table> |
---|
| 483 | </body> |
---|
| 484 | </html> |
---|