Namespaces and Introduction to XPath
R. Alexander Milowski
milowski@sims.berkeley.edu
School of Information Management and Systems
#1
Motivation for Namespaces
When all you have is a name:
What if I want to mix my markup with yours?
How do I associate semantics with mixed markup?
How do I associate a schema (or rules) with markup?
#2
Problems - Example 1
These are certainly different:
<date>1/27</date> <date><year>2004</year><day>1</day><month>27</month></date>
What if they are both correct?
#3
Problems - Example 2
How about:
<order> <item id="i1"/> <item id="i2"/> <item id="i3"/> </order>
How many different interpretations are there of 'order'?
#4
Solutions?
What do we need?
What is "in-scope" for XML syntactically?
What is "out of scope" for XML syntactically?
#5
Names and Namespaces
Names are broken into two parts:
prefix - an identifier for a namespace
local name - an identifier for a name in that namespace
These parts are separated by a colon and called QNames (Qualified Names).
Example:
<c:pseudocode> <c:comment xlink:href="http://somewhere..."/> </c:pseudocode>
This applies only to element and attribute names.
Any prefix containing 'xml' is reserved for the W3C.
#6
Prefixes and Namespaces
Prefixes must be declared by associated prefixes with namespaces "before" they are used.
You can only declare this association on elements.
The syntax is: xmlns:prefix="some:uri"
Example:
<c:pseudocode xmlns:c="urn:publicid:IDN+mathdoc.org:pseudocode:en"> <c:comment xlink:href="http://somewhere..." xmlns:xlink="http://www.w3.org/..."/> </c:pseudocode>
"before" means on the element where the prefix occurs or on an ancestor element.
The 'xml' prefix is pre-defined to be http://www.w3.org/XML/1998/namespace
#7
Defaulting Namespaces
Namespaces can be defaulted.
This applies to only element names without prefixes.
The syntax is: xmlns="some:uri"
Example:
<c:pseudocode xmlns:c="urn:publicid:IDN+mathdoc.org:pseudocode:en"> <c:comment xmlns="http://www.w3.org/1999/xhtml"> <p>This code does the following:</p> ... </c:comment> </c:pseudocode>
You can undeclare the default: xmlns=""
#8
Namespace Names and URI Values
The namespace name is a URI (Uniform Resource Identifier).
URI values can be web address (e.g. http://youdomain.com/...)
web locations: http://yourdomain.com/...
URN (names): urn:...
Or other schemes: scheme:scheme-specific-part
But the processor uses the exact string as the URI value and not URI equivalences:
http://www.somwhere.com/A
http://www.somwhere.com/%41
#9
Declarations have Scope
A namespace declaration's scope is the element where it occurs.
There is no different between declarations on the root element and elsewhere.
The element, its attributes, and its children may use that prefix in their names.
You can re-define namespaces.
#10
What's a Name?
It's not what you think.
The prefix is just an abbreviation of the actual namespace name (i.e. the value of the declaration).
A name now consists of two parts:
namespace name - the name of the namespace associated with the prefix.
local name - the part of the name after the colon.
The prefix is now non-critical syntax.
#11
The Infoset "Changes"
Namespace processing is optional.
When used, names change.
But this only affects elements and attributes.
You can get different property values depending on whether you process namespaces.
Names now have two parts (i.e. local names and namespace names).
Namespace declarations and their scopes occur.
#12
The Infoset "Changes" - Element
[namespace name] - the namespace name or "no value" if there isn't one.
[local name] - the local part of the name (i.e. after the colon).
[prefix] - The namespace prefix used on the element or "no value" if there isn't one.
[in-scope namespaces] - An unordered list of namespace info items.
[namespace attributes] - An unordered list of attribute information items corresponding to the namespace declaration attributes actually on this element.
#13
The Infoset "Changes" - Attribute
[namespace name] - the namespace name or "no value" if there isn't one.
[local name] - the local part of the name (i.e. after the colon).
[prefix] - The namespace prefix used on the attribute or "no value" if there isn't one.
#14
How it Works - An Example
The simplest thing is to default the namespace:
<order xmlns="urn:publicid:IDN+cde.berkeley.edu:example:order:en"> <from>me</from> <to>you</to> <items> <item>A clue</item> <item>A better clue!</item> </items> <memo> <p>I need this order now!</p> </memo> </order>
#15
How it Works - An Example - Part 2
But we can use a prefix and get the exact same names:
<o:order xmlns:o="urn:publicid:IDN+cde.berkeley.edu:example:order:en"> <o:from>me</o:from> <o:to>you</o:to> <o:items> <o:item>A clue</o:item> <o:item>A better clue!</o:item> </o:items> <o:memo> <p>I need this order now!</p> </o:memo> </o:order>
#16
How it Works - An Example - Part 3
If we change that prefix binding later on, we'll get different names:
<o:order xmlns:o="urn:publicid:IDN+cde.berkeley.edu:example:order:en"> <o:from>me</o:from> <o:to>you</o:to> <o:items xmlns:o="urn:publicid:IDN+cde.berkeley.edu:example:order:items:en"> <o:item>A clue</o:item> <o:item>A better clue!</o:item> </o:items> <o:memo> <p>I need this order now!</p> </o:memo> </o:order>
#17
How it Works - An Example - Part 4
We can also mix in a default if we wish:
<o:order xmlns:o="urn:publicid:IDN+cde.berkeley.edu:example:order:en"> <o:from>me</o:from> <o:to>you</o:to> <o:items xmlns:o="urn:publicid:IDN+cde.berkeley.edu:example:order:items:en"> <o:item>A clue</o:item> <o:item>A better clue!</o:item> </o:items> <o:memo xmlns="http://www.w3.org/1999/xhtml"> <p>I need this order now!</p> </o:memo> </o:order>
#18
Namespace Info Item
Namespace declarations are encoded as Namespace Info Items.
They have two properties:
[prefix] - the prefix whose binding this info item describes.
[namespace name] - the namespace name bound to this prefix.
These are not the namespace declarations.
Serialization of XML Infosets with namespaces is tricky.
#19
In-Scope Namespaces
Every namespace binding that is "in-scope" is listed in the element's [in-scope namespaces] property.
Again, this is not just those declared.
You have to look at the differences between parent and child to see what has changed.
This property is only on elements.
The prefixes 'xml' and 'xmlns' are always defined:
xml → http://www.w3.org/XML/1998/namespace
xmlns → http://www.w3.org/2000/xmlns/
The document implicitly defines 'xml' and 'xmlns'.
#20
And now for something completely different...
We're now actually going to do something with XML...
...but we need some tools.
#21
XPath
XPath is a syntax for "addressing" into a document.
They are "path expressions".
It allows you to expression things like:
The "para" child element of the "contents" element.
The next sibling of the "item" element.
The "item" element where the attribute "overbid" has value "true".
It is its own "mini standard" used by many specifications.
#22
Like Directory Paths
XPath expressions have a directory-path-like syntax.
A single "/" (forward slash) represents the Document info item--also know as the root.
Subsequent named "steps" in the path represent children:
/doc/title
selects the 'title' child element of the document element 'doc'.
But they don't have to be "rooted":
contents/para
selects the 'para' child element of the 'content' element.
#23
Node Set Results
The result of evaluating an XPath expression is a Node Set.
A node is just another term for "info item".
For example, given the content:
<contents> <para>One</para> <para>Two</para> <para>Three</para> </contents>
the expression:
/content/para
would return three 'para' elements as a set.
#24
Selecting Attributes
You can also select attributes by adding the step: @name
For example, given the content:
<contents> <para><a href="one.html">One</a></para> <para><a href="two.html">Two</a></para> <para><a href="three.html">Three</a></para> </contents>
the expression:
/content/para/a/@href
would return the attribute 'href' of each of the three paragraphs as set.
#25
Names and Namespaces
Any step expression can use a QName: h:body
The prefix binding is defined external to the expression (e.g. application specific).
Matching is based on the local name and namespace name and not the prefix.
We could add namespaces to the previous examples:
/s:contents/d:para /s:contents/d:para/h:a/@href
The application would have to define the prefixes 's', 'd', and 'h'.
#26
No Prefix = No Namespace
A name test without a prefix only matches something without a namespace.
For example:
m:section/title
matches
<m:section xmlns:m='urn:...'> <title>No Namespace</title> </m:section>
but not
<m:section xmlns:m='urn:...' xmlns='urn:something-else...'> <title>I've got a namespace</title> </m:section>
Remember, name matching is based on local name and namespace name alone!!!
#27
Wildcards
The '*' (asterisk) can be used to wildcard names.
Elements: All the elements contained in a 'content' element.
contents/*
Elements: All the attributes of 'para'.
para/@*
Namespaces can also be used:
s:contents/d:* d:para/@h:*
#28
Context Node
Evaluation is always with respect to a context node.
You can address the context node as '.' (period):
For example, the attributes of the context node:
./@*
The context node is implicit.
For example, these are equivalent:
contents/para ./contents/para
The context node does not have to be an element.
#29
Parent and Ancestors
From the context node you can access your parent and ancestors.
Just like directories, '..' represents the parent.
You can go back many levels:
../../section
This selects the 'section' element that is the context node's parent's parent
#30
Conditional Matching
Predicates on the step allow conditions to be specified.
They follow the step and are wrapped in square brackets ('[' and ']').
For example:
para[@id='mine']
selects 'para' elements where the 'id' attribute has value 'mine'.
There is a whole wealth of operators (including boolean logic) that can be used.
You can also have sub-expressions:
contents[para/@id='mine']
This selects a 'contents' element that has a child 'para' with an attribute 'id' of value 'mine'.
#31
Skipping Levels
You can match elements that aren't direct children with the "//" (double forward slash).
This looks through the descendants of the "current context".
For example:
/section//cite
will match all 'cite' elements that are descendants of 'section'. But:
//cite
will match all 'cite' elements in the document.
#32
Special Functions
There are some special functions that can be used as steps.
Function | Result |
---|---|
node() |
Matches any kind of node. |
text() |
Matches text. |
processing-instruction() |
Matches a processing-instruction. |
comment() |
Matches a comment. |
For example:
para/node()
matches all the children of a 'para' element including comments, text, and processing instructions.
#33
The Real Story
This is all just an abbreviated syntax.
There is a lot more...
We'll start by explaining the "axes" of a document.
#34
Trees and XML
For the computer scientists:
Formally, a tree is a connected, acyclic, undirected graph.
It would be nice if XML was a rooted positional n-ary tree.
XML isn't that simple.
But the parent-child relationships in the infoset do form a tree.
Attributes and namespaces mess this up a bit.
#35
Relationships in a Tree
#36
Additional XML Relationships
Attributes:
Each element can have attributes.
Attribute info items aren't children.
Namespaces:
Each element can have in-scope namespaces.
Namespace info items aren't children.
#37
Axes are Directions on Relationships
|
|
#38
Axis Syntax
Steps can be preceded by an axis name and a double colon:
contents/child::para para/preceding-sibling::* ancestor::section/title
If that relationship doesn't exist, you get an empty node set.
#39
Axis Specifics
Each axis has:
A direction of forward or reverse.
A principal node type: one of attribute, namespace, or element
The direction refers to the order in which items will be traversed.
#40
Principal Node Type
The principal node type refers to what a name matches:
On the 'child' axis, a name matches an element.
On the 'attribute' axis, a name matches an attribute.
Principal node type "special cases" the extra relationships:
Only the attribute axis has type 'attribute'.
Only the namespace axis has type 'namespace'.
Everything else has type 'element'.
#41
Axis Direction
Reverse Axes:
ancestor
ancestor-or-self
preceding
preceding-or-self
Everything else has a forward direction.
#42
Axis Direction - Example
For example, give the following
<doc> <a/><b/><c/> <target/> <d/><e/><f/> </doc>
These expressions evaluate:
target/preceding-sibling::* → elements 'c' b' 'a'.
target/following-sibling::* → elements 'd' 'e' 'f'.
#43
Abbreviated Syntax Equivalences
Abbreviation | Equivalence |
---|---|
../name |
parent::name |
name |
child::name |
//name |
descendant::name |
. |
self::node() |
* |
child::* |
@* |
attribute::* |
@name |
attribute::name |
#44
What use is this?
XPath is used extensively in XSLT to process and transform XML documents:
<xsl:transform version='1.0' xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head><xsl:apply-templates select="doc/title"/></head> <body> <xsl:apply-templates select="doc/contents"/> </body> </html> </xsl:template> </xsl:transform>
XML Schema and other standards use this for similar matching needs.
Many programming APIs & commercial products provide XPath for traversing and manipulating XML.
#45
Next Time
There's still more XPath to discuss.
But we will introduce XSLT first.
Then we will dig deeper in XPath semantics in relation to XSLT.