Introduction to XSLT and Advanced XPath
R. Alexander Milowski
milowski@sims.berkeley.edu
School of Information Management and Systems
#1
Motivation
What's the most common thing you want to do with XML once you've got it?
If your document has this wealth of information, you might want to... ?
What happens when things change?
#2
Motivation - Producing HTML or XHTML
HTML is one of the inherent outputs specified by the XSLT recommendation.
Producing HTML is really just an XML-to-XML transformation.
The serializer of the XSLT processor takes care of the HTML specifics.
#3
Motivation - Extracting XML
XSLT allows you to extract information from your documents.
You can also transform this information into other kinds of XML.
It is a kind of "transformational semantic".
#4
Motivation - XML to XML
You can translate your XML to other XML vocabularies.
This is useful for "upgrading" or "downgrading" your XML between versions.
You can do more than just transliteration of your XML.
#5
Motivation - non-XML Output
XSLT's architecture allows non-XML output.
"text" (no markup) is built in.
But you can define your own... but you have to write code.
#6
Motivation - Extensible
XSLT's architecture is extensible.
You can add to XPath and XSLT new processing semantics.
The syntax is the same but you can define your own semantics (within reason).
...but you have to write code (e.g. Java, C++, etc.).
#7
History of XSLT
XSLT is probably the most successful recommendations.
It was published in November 1999.
The W3C lists the number of processors as "XSLT: too many to list here."
Both IE and Netscape have some kind of XSLT support in the browser.
#8
XSLT is not XSL
Just to confuse you...
XSLT: XSL Transformations
XSL: eXtensible Stylesheet Language
XSL is for formatting XML documents for print or browser display.
XSLT is for transforming XML documents for whatever purpose.
#9
The XSLT Model
XSLT transforms infosets to infosets using rules:
Rules are packaged in a stylesheet and consist of patterns and actions.
#10
Getting Started
A transformation is specified by a "stylesheet" or "transform" document:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0' > <!-- your rules here --> </xsl:transform>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0' > <!-- your rules here --> </xsl:stylesheet>
The document element name has no effect on the outcome.
Applications may interpret 'stylesheet' differently from 'transform' in terms of:
whether they run the transformation
what they do with the results.
#11
The Top Level
The "Top Level" refers to the children of the document element.
Any element can occur at the top level but it must have a namespace
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0' xmlns:my="http://cde.berkeley.edu/my/other/stuff" > <!-- your rules here --> <my:other-stuff type="random-crap"/> </xsl:transform>
But this is illegal:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'> <!-- your rules here --> <mr-no-namespace-name/> </xsl:stylesheet>
Typically, you'll use elements from the XSLT namespace.
#12
A Simple Example
This stylesheet generates an XHTML document with the "text" of the document:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'> <xsl:template match="/"> <html xmlns='http://www.w3.org/1999/xhtml'> <head><title>Your Document's Value</title></head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> </xsl:transform>
Here's the stylesheet: text-only.xsl
Here's the result: simple.xml (input) (output), mouse-annotation.xml: (input) (output) , namespace.xml (input) (output)
#13
The Template Model
Templates are "pictures" of the result.
They are associated with the input document by matching patterns.
Elements in the XSLT namespace are replaced by their actions.
Literal elements are copied to the output.
Example:
<xsl:template match="para"> <p><xsl:apply-templates/></p> </xsl:template>
#14
xsl:apply-templates
xsl:apply-templates matches templates to selected nodes.
By default, the descendants are selected:
<xsl:apply-templates/>
You can specify a different XPath by the 'select' attribute:
<xsl:apply-templates select="section/title"/>
#15
Template Results
|
#16
Patterns
The 'match' attribute of a template requires a subset of XPath.
You can only match along the child or attribute axes.
Predicates are not restricted.
You can have multiple patterns separated by '|' to match several patterns.
Example:
contents/p slide[@show='true'] @href a/@href graphic[ancestors::section]
#17
Multiple Templates
If multiple templates match, the "most specific" is taken.
Single names (e.g. "para") have priority 0.
Wildcards (e.g. *, @*) have priority -0.25
Node tests for other nodes (e.g. comment(), node(), etc. ) have priority -0.5
Otherwise, the priority is 0.5
#18
Multiple Templates - Priorities
For example:
para → 0 h:* → -0.25 * → -0.25 node() → -0.5 contents/para → 0.5 contents/* → 0.5
You can adjust the priority to get what you want with a 'priority' attribute.
Example:
<xsl:template match="h:*" priority="1"> ... </xsl:template>
#19
Outputting Text
Any non-whitespace inside a template is automatically copied to the output.
Whitespace that contains a non-whitespace character is copied as well:
<xsl:template match="foo"> some text </xsl:template>
Whitespace between elements is "stripped" and not copied:
<xsl:template match="foo"> <p>some text</p> </xsl:template>
generates the following without a leading or trailing carriage return:
<p>some text</p>
#20
Preserving Whitespace
The element 'xsl:text' preserves text and whitespace.
For example, to preserve the whitespace in the previous example:
<xsl:template match="foo"><xsl:text> </xsl:text><p>some text</p><xsl:text> </xsl:text></xsl:template>
This is often used to add whitespace between non-literal elements.
#21
Literal Elements
A literal element is a non-XSL element.
It generates a copy of itself in the output.
The children may be generated by subsequent templates.
For example:
<xsl:template match="foo"> <html> <head><title>My Document</title> <style type='text/css'>...</style> </head> <body><xsl:apply-templates/></body> </html> </xsl:template>
#22
Attribute Value Templates
XPath expressions can be used to "insert" content into attribute values.
Attribute value templates are delimited by curly braces: {...}
Double curly braces are used if you want a curly brace in the attribute value.
The expression result becomes the attribute value.
#23
Attribute Value Templates - Example
For example
<img src="{@base-uri}/{@src}"/>
for the content:
<image-data base-uri="http://mydomain.com" src="picture.jpg"/>
would generate:
<img src="http://mydomain.com/picture.jpg"/>
#24
What Happened to my Comments?
Comments and processing instructions are ignored.
They aren't copied to the output.
For example:
<xsl:template match="foo"> <!-- The next element is significant --> <spam type='fried'/> </xsl:template>
generates:
<spam type='fried'/>
#25
Understanding Actions
Templates really specify actions.
A literal element or text is really an action to create a copy.
There are many other kinds of actions specified by elements in the XSLT namespace:
apply-templates, call-template, apply-imports, for-each, value-of, copy-of, number, choose, if, text, copy, variable, message, fallback, processing-instruction, comment, element, attribute.
XSLT is extensible: a processor can add to these.
#26
Creating Elements "Manually"
Elements can also be created by xsl:element.
This is used when the element name or namespace is created based on a expression.
An example:
<xsl:element name="top"> <a/><b/><c/> </xsl:element>
constructs:
<top> <a/><b/><c/> </top>
The children of 'xsl:element' are the children of the newly created element.
You can use expressions in the name:
<xsl:element name="{@name}"/>
#27
Creating Attributes "Manually"
Attributes can also be created by xsl:attribute.
They must be created before children are added to the element.
You can use them on literal elements:
<section> <xsl:attribute name="id">sect1</xsl:attribute> </section>
Or on xsl:element constructions
<xsl:element name="section"> <xsl:attribute name="id">sect1</xsl:attribute> </xsl:element>
The children of xsl:attribute must be text nodes that represent the value of the attribute.
You can use expressions in the name:
<xsl:attribute name="{child/@ref}"/>
#28
Creating Comments & Processing Instructions
Comments are created by:
<xsl:comment> your comment text here </xsl:comment>
Processing Instructions are created by
<xsl:processing-instruction name="target"> your PI text here </xsl:processing-instruction>
#29
Values
You can get values of elements or attributes via xsl:value-of:
<xsl:value-of select="person/name"/> <xsl:value-of select="@href"/>
The select attribute can contain any XPath expression.
The value is the result of collection the text "children" of the expression.
This is really the same as the string() function being applied to the resulting node set.
#30
Copying Nodes
Sometimes you might want to copy a node to the output.
xsl:copy will copy the matching node to the output.
It only applies to the current node and not its children or attributes.
Example:
<xsl:template match="credit-card"> <xsl:copy>XXXX-XXXX-XXXX-XXXX</xsl:copy> </xsl:template>
would create:
<credit-card>XXXX-XXXX-XXXX-XXXX</credit-card>
#31
The Identity Transform
You can specify the identity transformation with xsl:copy:
<xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template>
This matches and copies all nodes to the output.
#32
xsl:copy-of
You can also just copy wholes and their structure to the output.
xsl:copy-of has a syntax like xsl:value-of:
<xsl:copy-of select="p"/>
All attributes, children, etc. are copied to the output.
xsl:copy and xsl:copy-of have very different uses.
xsl:copy is often used when you might want to convert a few elements and copy the rest.
#33
Making Small Changes with xsl:copy
You can change substructure with identity and a few templates:
This changes the 'href' attribute to 'uri-ref' and changes the 'a' element to 'link'.
<xsl:template match="@href"> <xsl:attribute name="uri-ref"><xsl:value-of select='.'/></xsl:attribute> </xsl:template> <xsl:template match="a"> <link> <xsl:apply-templates select="@*|node()"/> </link> </xsl:template> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template>
You can try this out yourself: small-changes.xsl (input) (output)
#34
More XPath
We'll cover:
predicates in more detail
functions
positions
#35
Datatypes
XPath has three basic datatypes: nodes, literals, and numbers.
All numbers are floating point numbers.
Literals must be quoted: 'value' or "value"
#36
Functions
You can make function calls in expression and predicates.
Functions can take parameters and they always return values.
Parameters can be expressions (e.g. person/@href ).
Datatypes are implicitly converted to the type required by a parameter.
#37
Functions - Nodes
There are some basic node functions:
last() - returns the index of the last node in the current node set.
position() - returns the index of the current node in the current node set.
count(node-set) - returns the size of the node set.
local-name(node-set?) - returns the local name of a node.
namespace-uri(node-set?) - returns the namespace name of a node.
name(node-set?) - returns the QName of a node.
#38
Functions - Strings
There are some basic string functions:
concat(string,string,string*)
starts-with(string,string)
contains(string,string)
substring-before(string,string)
substring-after(string,string)
substring(string,number,number?)
string-length(string?)
#39
Functions in General
There are many functions.
You can look them up in your XSLT book.
Sometimes processors provide their own "extension" functions.
You just need to understand how type conversion works in XSLT/XPath.
#40
Predicates Semantics
Predicates result in an expression value that is converted to a boolean.
If the value is a number, then the position is checked against that number.
These are equivalent:
slide[2] slide[position()=2]
Otherwise, the boolean() function is applied to convert its value to a boolean:
A node set is true if it is non-empty.
A string is true if it has a non-zero length.
Boolean expressions (and, or, not ) are already boolean values.
#41
Predicates - Operators
Predicates can be separate by boolean operators: 'or' , 'and'
a[@name or @href] word[syn and @link]
Negation is a function
a[not(@name)]
Values can be compared by: =, !=, <, >, <=, >=
a[@name='termlink'] word[@type!='noun' and position()>1]
#42
Multiple Predicates
Predicates can be specified separately:
a[@name][@href] slide[1][notes]
Each predicate is filtered by the next.
They have equivalent forms as 'and' predicates:
a[@name and @href] slide[position()=1 and notes]
Separate predicates may be easier to write.
#43
Predicates and Steps
Predicates don't have to be at the end:
computer[price]/specs slide[citation]/cite[@ref]
The "filter" is applied at each step.
Keep in mind that predicates have additional cost and templates may be faster:
<xsl:template match="slide[citation]"> <xsl:apply-templates select="cite[@ref]"/> </xsl:template>
#44
Predicates - Positions
Positions are relative to what you selected in the step.
They are not necessarily the sibling the position in the input.
For example, given:
<top> <a/><b/><a/><b/> </top>
the expressions:
top/a[1] top/a[2] top/b[1] top/b[2]
select all the element children of 'top'. These do not:
top/a[1] top/b[2] top/a[3] top/b[4]