Modeling Elements and Attributes
R. Alexander Milowski
milowski at sims.berkeley.edu
#1
Which Comes First?
Do you create an instance and then the schema?
Do you write a schema and then use it to create instance documents?
Keep in mind that the instance is used much more often than the schema.
The instance is Queen. (...there is a story behind this).
#2
Kings, Queens, and Benevolent Dictators
We'll assume that we're in storybook land where everyone loves the King and Queen because they're nice people and nice to people.
But they are still King and Queen and rule the land.
Our neighboring country DTD Land has dictator... sure glad we aren't over there.
#3
DTD's - The Benevolent Dictator
In DTD Land, the document is either valid or invalid.
There is no room for partial validity.
Everything must be declared.
Don't cross he dictator or you'll lose your content!
In the days of SGML, if it wasn't valid, you couldn't load your document.
Hence the label "Dictator".
...but Benevolent Dictator because the point of the DTD was well intentioned.
Lesson: Validity is only one state and invalid or non-validateable documents are OK.
#4
King Schema
The schema is King.
It defines the rules by which the document can be considered valid.
But the King is smart and flexible and knows a date or integer from a string.
He is also flexible by allowing partial validity, wildcards, a composition of documents.
His rules are complex but precise.
But cross him in the wrong context and he'll throw you in the dungeon.
Lesson: Validity against a flexible schema language is good and applications can demand that input be valid.
#5
The Instance is Queen
But the Queen is smarter.
Communication of good structured content between willing parties is key to a successful kingdom.
Sometimes that content is ahead of the rules that the King has proclaimed.
The Queen, with her ability to persuade the King, changes the rules of the schema to conform to the instance that the people want to use.
The Lesson:
The instance is far more common, directly useable, and malable to new requirements. People will change content, adding in their own structures, without knowing or caring that the schema does not reflect these new requirements. Taking this into account, schemas should be instructed by the content people want to encode rather than restrict what they can encode.
In summary, start with the document you want to model and not with the schema. The schema is an artifact of constructing those instances and describes a class of instances similar to your content.
#6
Namespace Names in Documents
Schemas describe the content associated with the namespaces you use in your documents.
In XML Schema terminology, the namespace name used in the instance is the target namespace.
You can have multiple schemas for the same target namespace.
But you can only use one schema for a namespace during validation.
URL's that start with http://... are just fine for target namespaces. Just make sure you own the domain!
#7
Use a URN for Namespace Names
A URN is a name and doesn't represent a location.
So it is great for document namespaces.
It has a special syntax and starts with the scheme 'urn'.
The parts of the name are colon separated.
Following the scheme is an NID (Namespace ID)--which are restricted. You can't make up your own NID!!!
For example, the namespace for my slide content is:
urn:publicid:IDN+mathdoc.org:schema:slides:2004:1.0:us
#8
The 'publicid' NID
The 'publicid' NID is one that you can use with your domain name.
Start your namespace name with: urn:publicid:IDN+yourdomainhere.com:
The rest you make up with the following rules:
Spaces become '+' (plus sign).
'+' becomes %2B
'/' becomes %2F
See RFC 3151 for more information.
For example, the namespace for my slide content is:
urn:publicid:IDN+mathdoc.org:schema:slides:2004:1.0:us
#9
Schema Documents vs Schemas
A schema document is the XML syntax of a schema.
You can have multiple schema documents for one schema.
The schema's declarations and defintions are the result of processing all the schema documents for all namespaces used.
#10
Top Level Elements
The top-level elements are the children of the 'schema' element:
include, import, redefine, annotation
element, attribute
simpleType, complexType, group, attributeGroup, notation
These must occur first: include, import, redefine.
Otherwise, any order goes.
The schema element can be empty too.
#11
Declaration vs Definition
This is just terminology used in XML Schema.
Declaration:
A declaration is something used in a document (e.g. element and attributes).
Definitions:
A definition is something used only by the schema.
#12
Global Names
Schemas have "global" names that are the only starting place.
Type can only come from elements at the top-level.
Each "global" name is associated with a declaration or definition type:
elements
attributes
complex & simple types
groups
attribute groups
notations
That is, a global name in a schema is a pair of a local-name and kind of object.
#13
Simple vs Complex Types
Simple types are types for values:
numbers: integers, floats, doubles
strings, tokens, names, language codes
dates, times, etc.
Complex types are element structures.
content models
attributes
#14
Predefined Simple Types
Figure 1. Figure
#15
Simple Element Declarations
This declares an element of a particular simple type:
<xs:element name="person" type="xs:string"/> <xs:element name="pubdate" type="xs:date"/>
The 'name' attribute defines the local name value in the target namespace.
The 'type' attribute points to a defined simple type.
You can point to your own simple types too.
#16
Declaring Elements with Structure
When elements have structure, you need to define the structure.
The structure is defined by a 'complexType' element:
<xs:element name="person"> <xs:complexType> ... </xs:complexType> </xs:element>
The content of 'complexType' defines the structure of the element.
#17
Element Children
An element's content is defined by combining a construct called a 'particle':
'sequence' element - a sequence of children elements.
'choice' element - a choice of children elements.
'element' element - a specification of a child element.
'all' element - any ordering of child elements.
'any' element - any elements from a particular namespace.
This lecture will cover 'sequence', 'choice', and 'element'.
Keep in mind that 'any' and 'element' cannot be used as a direct child of 'complexType'. :(
#18
Using Names in Models
When using a name, it has to be in the namespace where the construct is defined.
An element name must be in the namespace of the schema for which it is declared.
But they are QNames so you need to declare the namespaces just like in XSLT.
Example: "my:name"
The local name is 'name'.
The namespace name is the URI to which the prefix 'my' resolves.
So, you should re-declare your target namespace:
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='urn:publicid:IDN+cde.berkeley.edu:...' xmlns:my='urn:publicid:IDN+cde.berkeley.edu:...'> ... </xs:schema>
#19
Children - Specifying an Element
Referring to an element:
<xs:element ref="my:name"/>
The attribute 'minOccurs' specifies the minimum number of these child elements (default is 1).
The attribute 'maxOccurs' specifies the maximum number of these child elements (default is 1).
<xs:element ref="my:name" minOccurs='2' maxOccurs='5'/>
This says: "element 'my:name" can occur between 2 and 5 times here."
The attribute 'maxOccurs' can have the value 'unbounded.
<xs:element ref="my:name" minOccurs='2' maxOccurs='unbounded'/>
This says: "element 'my:name" can occur 2 or more times here."
Examples in context:
<xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element ref="my:name" maxOccurs="unbounded"/> <xs:sequence> </xs:complexType> </xs:element>
#20
Sequences
The sequence element specifies an exact sequence of children elements.
Example
<xs:sequence> <xs:element ref="my:name"/> <xs:element ref="my:address"/> <xs:element ref="my:ssn"/> </xs:sequence>
This says: "The child sequence 'my:name', 'my:address', 'my:ssn' must occur here."
You can use 'minOccurs' and 'maxOccurs' on sequences:
<xs:sequence maxOccurs="unbounded"> <xs:element ref="my:name"/> <xs:element ref="my:address"/> <xs:element ref="my:ssn"/> </xs:sequence>
This allows the sequence 'my:name', 'my:address', 'my:ssn' to repeat over and over again.
#21
Choices
The choice element specifies an set of choices.
Only one of the children can occur.
Example
<xs:choice> <xs:element ref="my:name"/> <xs:element ref="my:address"/> <xs:element ref="my:ssn"/> </xs:choice>
This says: "Either 'my:name', 'my:address', or 'my:ssn' must occur here but not more than one."
You can use 'minOccurs' and 'maxOccurs' on choices.
#22
Attributes
Attributes can be declared in the 'complexType' element via the 'attribute' element.
Example:
<xs:attribute name="href" type="xs:anyURI"/>
The 'use' attribute specifies optionality and can have values:
optional - the attribute doesn't have to be present.
required - the attribute must be present.
prohibited - the attribute is prohibited from being used (for type derivation cases).
Don't worry about the value 'prohibited' for now...
In context:
<xs:element name="link"> <xs:complexType> <xs:attribute name="href" type="xs:anyURI"/> </xs:complexType> </xs:element>
#23
Fixed or Defaulted Attributes
You can fix an attribute value by the 'fixed' attribute on the declaration:
<xs:attribute name="color" type="xs:string" fixed="blue"/>
You can also default the value by the 'default' attribute on the declaration:
<xs:attribute name="color" type="xs:string" default="blue"/>
#24
Attributes on Simple Typed Elements
If you want to have an attribute on an element with simple typed content (e.g. string), you have to "extend" that type:
<xs:element name="a"> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="href" type="xs:anyURI"/> </xs:extension> </xs:simpleContent> </xs:element>
This "extends" the simple type 'xs:string' to have an attribute.
Type extension will be talked about in the next lecture.
#25
Mixed Content
This is mixed content:
<p>Hello <a href="http://www.w3.org">W3C</a></p>
i.e. elements and text at the same level.
There's a simple flag for this:
<xs:element name="p"> <xs:complexType mixed='true'> <xs:sequence> <xs:element name="my:a" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
But you can't type the text--its just a string (e.g. you can't say its an integer).
#26
Qualified vs Unqualified
Element and attribute names can be qualified or unqualified if they are declared "locally".
Most attributes are declared in the context of declaring the element.
You could force them to be "qualified" and require a namespace name/prefix, but that isn't what people normally do.
On the other hand, for elements declared locally (next slide), there is no such "normal" way.
You can setup schema defaults by setting the 'attributeFormDefault' and 'elementFormDefault' attributes on the 'schema' element:
<xs:schema elementFormDefault="qualified"> </xs:schema>
#27
Local Element Declarations
You can declare elements inside content models:
Example:
<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element form="qualified" name="name" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
which validates:
<my:person><my:name>Milowski</my:name></my:person>
If 'elementFormDefault' is 'qualified', then you can use:
<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
But if 'form' or 'elementFormDefault' is 'unqualified' (the default), then your instance must be:
<my:person><name>Milowski</name></my:person>
#28
Forcing Qualified Local Elements
The 'form' attribute can make it qualified:
<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="name" form='qualified' type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
which validates:
<my:person><my:name>Milowski</my:name></my:person>
You can setup schema defaults by setting 'elementFormDefault' to 'qualified' and all local elements will be qualified by default:
<xs:schema elementFormDefault="qualified"> </xs:schema>
#29
Empty Elements?
This is the simplest declaration:
<xs:element name="stop"/>
which allows this to be valid:
<stop> <junk>I am a junk element with low self-esteem.</junk> </stop>
This is NOT an empty element.
The type is xs:anyType which is mixed and allows any children.
This is an empty element:
<xs:element name="stop"> <xs:complexType/> </xs:element>
#30
Catalogs
A catalog maps namespaces to schema documents.
Its an XML document in the namespace: urn:oasis:names:tc:entity:xmlns:xml:catalog
The spec is at: OASIS's Website
There are two things to be concerned with:
Mapping URI values that start with "urn:publicid:..."
Everything else.
Example:
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" prefer='public'> <uri name="http://cde.berkeley.edu/~milowski/schemas/example-form/event/200402" uri="event.xsd"/> <public publicId="IDN cde.berkeley.edu//milowski//schemas//example-form//event//200402" uri="event.xsd"/> </catalog>
#31
Public Identifiers and URNs
Any URN that starts with "urn:publicid:..." is a public identifier.
Its a formal way of naming a resource (e.g. a schema).
The specification at the OASIS site will tell you more.
The URN gets mapped to a public identifer string:
URN Value: urn:publicid:IDN+cde.berkeley.edu:milowski:schemas:example-form:event:200402
Public Identifier: IDN cde.berkeley.edu//milowski//schemas//example-form//event//200402
So you just use the 'public' element in the catalog.