Bottom Up and Top Down: Introduction to XML Schema
R. Alexander Milowski
milowski at sims.berkeley.edu
#1
Bottom Up: An Example
We've got our pizza orders in XML...
<orders xmlns='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us'> <order taken-on="2004-02-23T18:43:00-08:00"> <pizza> <toppings> <kind>Jalapenos</kind> <kind>Pepperoni</kind> <kind>Feta</kind> <kind>Garlic</kind> </toppings> </pizza> </order> <order taken-on="2004-02-23T19:22:00-08:00"> <pizza> <toppings> <kind>Ham</kind> </toppings> </pizza> </order> <order taken-on="2004-02-23T19:26:00-08:00"> <pizza> <half> <toppings> <kind>Green Olives</kind> </toppings> </half> </pizza> </order> </orders>
#2
An Example - Continued
And our deliveries in XML as well...
<delivery xmlns='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us'> <order taken-on="2004-02-23T18:43:00-08:00" delivered-on="2004-02-23T19:15:00-08:00"> <pizza> <toppings> <kind>Jalapenos</kind> <kind>Pepperoni</kind> <kind>Feta</kind> <kind>Garlic</kind> </toppings> </pizza> </order> </delivery>
#3
Create the Schema Document
An XML Schema is an XML document:
Namespace:
http://www.w3.org/2001/XMLSchema
Document Element:
schema
The attribute 'targetNamespace' specifies the namespace that the schema defines.
So we start with:
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us' xmlns:my='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us' > ... </xs:schema>
NOTE: This slides will always use 'xs' for the prefix of the schema namespace.
#4
The Schema Concepts
We need to declare our elements by defining types that describe their structure.
At the "top-level", each declaration or definition will be accessible to applications outside our schema.
A simple strategy is to declare each element we see in our example documents.
Attributes are part of those elements, so we'll get those along the way.
#5
The 'orders' Element
The 'orders' element is a sequence of order elements.
Maybe there aren't any orders, so it could be empty.
That looks something like this:
<xs:element name="orders"> <xs:complexType> <xs:sequence> <xs:element ref="my:order" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
Note that the 'my' prefix is bound to the targetNamespace value of the schema. This means that the element 'my:order' must be defined in this schema.
#6
The 'order' Element
On the way in, 'order' looks like:
<order taken-on="2004-02-23T19:22:00-08:00"> <pizza> <toppings> <kind>Ham</kind> </toppings> </pizza> </order>
While being delivered:
<order taken-on="2004-02-23T19:22:00-08:00" on-its-way='true'> <pizza> <toppings> <kind>Ham</kind> </toppings> </pizza> </order>
When delivered:
<order taken-on="2004-02-23T19:22:00-08:00" delivered-on="2004-02-23T19:15:00-08:00"> <pizza> <toppings> <kind>Ham</kind> </toppings> </pizza> </order>
#7
The 'order' Element Declaration
The XML Schema elves cooked this up over night while you slept:
<xs:element name="order"> <xs:complexType> <xs:sequence> <xs:element ref="my:pizza" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="taken-on" type="xs:dateTime" use="required"/> <xs:attribute name="on-its-way" type="xs:boolean"/> <xs:attribute name="delivered-on" type="xs:dateTime"/> </xs:complexType> </xs:element>
But now we need to declare my:pizza...
#8
The Elves Were Busy...
Good thing we have XML Schema elves:
<xs:element name="pizza"> <xs:complexType> <xs:sequence> <xs:choice> <xs:element ref="my:toppings"/> <xs:sequence> <xs:element name="half" form="qualified" minOccurs="2" maxOccurs="2"> <xs:complexType> <xs:sequence> <xs:element ref="my:toppings"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:choice> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="toppings"> <xs:complexType> <xs:sequence> <xs:element name="kind" form="qualified" maxOccurs="unbounded" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
#9
Are We Done?
Darn. We forgot that delivery element!
...and the elves have gone on strike.
What should the declaration for 'delivery' look like?
<delivery xmlns='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us'> <order taken-on="2004-02-23T18:43:00-08:00" delivered-on="2004-02-23T19:15:00-08:00"> <pizza> <toppings> <kind>Jalapenos</kind> <kind>Pepperoni</kind> <kind>Feta</kind> <kind>Garlic</kind> </toppings> </pizza> </order> </delivery>
#10
My Answer
Very similiar to the 'orders' element but forced to be non-empty:
<xs:element name="delivery"> <xs:complexType> <xs:sequence> <xs:element ref="my:order" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
#11
Validating with Netbeans
Netbeans supports XML Schema:
Authoring and checking schemas.
Validating documents against schemas.
Catalogs for locating schemas.
That 'double triangle' button will validate a document if it has a schema.
You need to associate the document with the schema... there are better ways that we'll learn next time.
#12
Specifying the Schema Location
There is a special attribute in the 'http://www.w3.org/2001/XMLSchema-instance' namespace called 'schemaLocation'.
The value is a list of whitespace separated namespace name and locations.
For example, the order document:
<orders xmlns='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='urn:publicid:IDN+cde.berkeley.edu:schemas:examples:200402:us pizza-order.xsd'> ... </orders>
This maps the namespace name of 'orders' to the local schema document 'pizza-order.xsd'.
The syntax is horrible and if you move the document the location could break.
We'll learn better ways next time... but try this in Netbeans.
#13
No More Elves
They've refused to write your schemas.
So you've got to learn the syntax, rules, and concepts.
I don't know what they are complaining about.
XML Schema is not that hard!
#14
Top Down: What Are Schemas?
Structure Relations?
Constraints?
Typing?
#15
Schema Language vs Schema Language
Modeling everything is impossible.
Constraints are a slippery slope.
Some languages don't really have a type system.
But the XML Schema language from the W3C does handle a lot of requirements.
XML Schema supported concepts will be listed with a check mark:
#16
Structure Relations
What elements are allowed to be document elements?
What children can an element contain?
What attributes can an element have?
What elements have values?
#17
Structure - Children & Models
Element's that contain elements are typically modeled:
sequences of elements (e.g. A followed by B followed by C)
choices of elements (e.g. A, B, or C)
occurrences (e.g. one or more, between 5 and 10, optional)
all of these elements in any order
some of these elements in any order
Models allow combinations of the above. (but there's restrictions)
#18
Structure - Mixed Content
Sometimes elements and text occur at the same level:
<p>This is a paragraph with a link to <a href="...">something</a>.</p>
Its called "mixed content".
#19
Structure - Wildcards
With namespaces, you might want to say:
Anything from a particular namespace goes here.
Anything from a set of namespaces goes here.
Anything except the current namespace.
Anything but a particular namespace goes here.
#20
Structure - Attributes
What attributes are associated with elements.
But you might want to say:
It is optional
It has a certain value
Here's a default value.
#21
Structure - Validation Control
Controling how all the children are validated against their schema.
Controlling:
Strict validation
"Process if you got 'em"
"I don't care."
#22
Types of Constraints
Constraining values (e.g. the value of this attribute must be a date).
Cross-field validation.
Cross-document validation.
Referential integrity within the document.
#23
Value Constraints
Simple lists of values.
Data types (e.g. integers, dates, etc.)
Regular expressions.
#24
Cross Referencing Constraints
Keys & Key-refs like in XSLT.
Simple links within a document.
Links external to a document.
#25
Cross-Field Constraints
Conditional value rules (e.g. if this attribute has value 'x' then this other must have value 'y').
Structural rules (e.g. if this element has this structure, then this other must have another structure.).
Mixing the above.
#26
Typing
Type derivation for data types (simple types).
Type derivation for element structures.
Typing vs. use.
#27
Value Constraints as Typing
Built-in data type hiearchy.
Custom derivations (e.g. integers between 1 and 10).
Value enumeration.
Lexical pattern matching (e.g. regular expressions).
#28
Element Structure Derivation
Extending content models:
Adding to the end?
Adding to the start?
Adding at a specified start?
Adding attributes.
Restricting attributes:
Making optional required.
Excluding optional attributes.
Restricting values.
#29
Name vs. Type
This distinction is :
Elements are the types.
Elements have types and types can have names.
Example:
<person><name>Alex</name></person>
Is this an element type named 'person'?
Is this an element named 'person' that is an instance of a type with child structure 'name'.?
#30
Polymorphic Content
Example:
<people> <student status='freshman'><name>Jimmy Dean</name></student> <professor><name>Bob Glushko</name></professor> <lacky><name>Alex Milowski</name></lacky> </people>
Each element 'student', 'professor', and 'lacky' could have a type derived from type 'person'.
The model for 'people' says things of type 'person' goes here.
#31
XML Schema
Developed as the result of a number of XML Schema submissions (SOX, XML Data, etc.)
One main goal was to replicate modeling techniques used in DTDs.
And to provide modern typing and data typing.
All this packaged syntactically as an XML document.
XML Schema became a recommendation in May 2001.
There is a lot of functionality so don't get overwhelmed.