Even More Schema!
Documentation, Complex Type Derivation, and the PSVI
R. Alexander Milowski
milowski@sims.berkeley.edu
School of Information Management and Systems
#1
Documentation
You can document your schemata in an XML syntax with the 'annotation' element.
It has two element children:
xs:appinfo - typically for additional application information (e.g. relational tables, constraint rules, Java method mappings).
xs:documentation - intended to contain the human-readable documentation for the construct.
You can put anything inside these elements (e.g. XHTML).
Example:
<xs:annotation> <xs:appinfo><java:method name="setBingo"/></appinfo> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This element represents a bingo game.</p> </xs:documentation> </xs:annotation>
#2
Documenting an Element
The first child of 'element' can be an 'annotation' element.
Example:
<xs:element name="person-name"> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This element represents a person's name. The middle name is optional.</p> </xs:documentation> </xs:annotation> ... </xs:element>
#3
Documenting a Type
The first child of 'complexType' or 'simpleType' can be an 'annotation' element.
Example:
<xs:complexType name="XHTMLBlockContents"> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This type represents a container of XHTML block elements.</p> </xs:documentation> </xs:annotation> ... </xs:complexType>
#4
Documenting an Attribute
The first child of 'attribute' can be an 'annotation' element.
Example:
<xs:attribute name="src" type="xs:anyURI"> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This attribute is a link to a image object. The value can be a relative URL.</p> </xs:documentation> </xs:annotation> </xs:attribute>
#5
Fully Document Your Schemata
If you're "good", you'll do this.
...but we're not always "good",... like those comments in your Java code, right?
Example:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="..." > <!-- there is one main information annotation for the schema namespace --> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This schema is used for...</p> </xs:documentation> </xs:annotation> <!-- every element has an annotation --> <xs:element name="..."> <xs:annotation> ... </xs:annotation> ... </xs:element> <!-- every type has an annotation --> <xs:complexType name="..."> <xs:annotation> </xs:annotaiton> ... </xs:complexType> <!-- etc. --> </xs:schema>
#6
Documentation Example
Here's my documented XHTML schema: xhtml.xsd
Here's the processed documentation.
#7
Complex Type Derivation
You can derive complex types from other complex types.
Just like in OO-languages, the derived type is an instance of its super type.
But what does that mean for XML?
#8
Valid against the Super Type?
Each parent-child relationship of the super type is preserved.
The order of the children of the super type is preserved.
The attributes of the super type is preserved.
If something is optional, its OK to disallow it in the derived type.
If something occurs multiple times, it may be OK to restrict that occurrence.
#9
Example
Each of these elements could have a super type:
<t:student id="s1"> <name>Jimmy</name> </t:student> <t:parent id="p1"> <name>Jimmy's Mom</name> <address> <street>123 Mars</street> <city>San Francisco</city><state>CA</state> </address> <children> <student ref="s1"/> </children> </t:parent> <t:teacher> <name>Dr. Evil</name> <students> <student ref="s1"/> </students> </t:teacher>
These could all have a base type of:
<xs:complexType name="PersonInfo"> <xs:sequence> <xs:element name="name" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:ID"/> </xs:complexType>
#10
Extension vs. Restriction vs. Simple Type Restriction
Schema gives you these derivation options:
extension - added elements to the end of the children or adding attributes.
Example: Teacher can add 'students' to the 'PersonInfo' type.
restriction - removing optional elements or attributes.
Example: Teacher can remove the optional 'id' attribute.
You can use these to restrict/extend either simple or complex types.
#11
Simple Type Restriction/Extension as a Complex Type
The 'simpleContent' element is used to restrict/extend a simple type.
It has either an 'restriction' or 'extension' element as a child.
Use 'extension' when you want to add attributes and have an element value.
Use 'restriction' when you want to add attributes and restrict the element value from a simple type.
Example:
<xs:complexType name="Anchor"> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="href" type="xs:anyURI"/> </xs:extension> </xs:simpleContent> </xs:complexType>
#12
Complex Content
The element 'complexContent' can be use to specify derivation.
It can also be use to wrap simple definitions that aren't derivations.
These are the same:
<xs:complexType name="Pair"> <xs:sequence> <xs:element name="A" type="xs:string"/> <xs:element name="B" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="Pair"> <xs:complexContent> <xs:restriction base="xs:anyType"> <xs:sequence> <xs:element name="A" type="xs:string"/> <xs:element name="B" type="xs:string"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType>
Usability?
#13
Extending a Complex Type - Elements
The element 'extension' is used to specify a complex type extension derivation.
You put it in the 'complexContent' element:
<xs:complexType name="Teacher"> <xs:complexContent> <xs:extension base="my:PersonInfo"> <!-- the new elements (appended) go here --> <xs:element name="students"> <xs:complexType> <xs:sequence> <xs:element name="student" type="my:Student" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:extension> </xs:complexContent> </xs:complexType>
That derivation added an element 'students' to the end of the children of 'PersonInfo'.
Notice how the same rules for in-lining and referring to types apply.
#14
Extending a Complex Type - Attributes
You can also just add attributes by extension:
This adds an 'id' attribute to the previous type:
<xs:complexType name="TeacherWithId"> <xs:complexContent> <xs:extension base="my:Teacher"> <xs:attribute name="id" type="xs:ID"/> </xs:extension> </xs:complexContent> </xs:complexType>
#15
The Teacher/Parents/Students Example
Here's the schema for the teacher/parent/student example from slide #18:
The schema: teacher.xsd
A conforming document: teacher.xml
#16
Extension and the Instance
The same rules apply to derived complex types as for "normal" complex types.
That is, you need to consider where the declarations occur to figure out what namespaces are required.
You can get a "rainbow" of namespaces from deep type derivations.
#17
Polymorphic Content
You can have instances with derived types.
Your original schema doesn't have to know about it.
Their extensions/restrictions are allowed in the instance.
But they have to tell the processor about the derived type:
<student xsi:type='my:SpecialStudent'> <my:xml-knowledge>XSLT XML Schema</my:xml-knowledge> </student>
And a possible type for this:
<xs:complexType name="SpecialStudent"> <xs:complexContent> <xs:extension base="Student"> <xs:element name="my-knowledge" type="xs:string"/> </xs:extension> </xs:complexContent> </xs:complexType>
#18
Abstract Types
Any type can be declared abstract by adding the 'abstract' attribute with a value of 'true'.
This means no element can be declared of that type.
The only exception is for substitution groups (next slide).
Example:
<xs:complexType name="Person" abstract="true"> <xs:sequence> <xs:element name="name" type="my:PersonName"/> </xs:sequence> </xs:complexType>
#19
Substitution Groups
A substitution group is used to use elements as placed holders.
It is a way to have an extensible "choice" in your content model.
Elements elect to "join" the substitution group by having a 'substitutionGroup' attribute:
<xs:element name=a" type="xhtml:Anchor" substitutionGroup="xhtml:inline"/>
The target, in this case 'xhtml:inline', is called the representative member of the substitution group.
The type of the element must be derived from the type of the representative member.
#20
Example - XHTML Blocks and Inlines
I've created my own XHTML schema that uses substitution groups.
This makes XHTML more extensible.
Here's the schema: xhtml.xsd
#21
Abstract Elements
An abstract element is like an abstract type.
This is used to make the representative member of the substitution group be unusable.
Example:
<xs:element name="inline" abstract="true" type="xhtml:Inline"/>
#22
Example - Fruit Basket
This example uses substitution groups to allow different kinds of fruit: fruit-basket.xsd
The citrus fruit are defined in a separate schema: citrus-fruits.xsd
Here's a "master schema" that imports both: rotten-fruit.xsd
These should validate:
fruit-order-1.xml : Validates in XMLMind, XSV, and Xerces (using a catalog).
fruit-order-2.xml : Validates in XMLMind and XSV, but not in Xerces (a "feature"?)
fruit-order-3.xml : Validates in XSV and Xerces but not XMLMind (a bug?)
#23
schemaLocation - I Lied!
OK, I didn't really lie... just omitted something as to not confuse you.
schemaLocation attributes in the instance are hints.
Processors are free to ignore your hints.
Which is really annoying when your hint is correct.
Thag say: "schemaLocation bad. BAD!!!"
#24
Substitution vs. xsi:type
A substitution group allows you to formally define an extension to a content model.
The xsi:type attribute allows ad-hoc extensions.
The xsi:type also allows room for unanticipated needs.
You can't change the element name with xsi:type but you can with substitution groups.
Substitution groups are better for tools (see XMLMind Demo).
#25
Restricting Complex Types
When you restrict a type, you remove content possibilities.
But you can only:
Remove optionality.
Prohibit elements/attributes.
Limit values (e.g. restrict them).
#26
Restricting Complex Types - Example
In restrictions, you repeat the content model with the restrictions in place:
<xs:complexType name="Product"> <xs:sequence> <xs:element name="number" type="xs:string"/> <xs:element name="name" type="xs:string"/> <xs:element name="size" type="my:Size" minOccurs="1"/> <xs:element name="color" type="my:Colors" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:complexType name="ProductWithSize"> <xs:complexContent> <xs:restriction base="my:Product"> <xs:sequence> <xs:element name="number" type="xs:string"/> <xs:element name="name" type="xs:string"/> <xs:element name="size" type="my:Size"/> <xs:element name="color" type="my:Colors" minOccurs="0"/> </xs:sequence> </xs:restriction> </xs:complexContent/> </xs:complexType> <xs:complexType name="MinimalProduct"> <xs:complexContent> <xs:restriction base="my:Product"> <xs:sequence> <xs:element name="number" type="xs:string"/> <xs:element name="name" type="xs:string"/> </xs:sequence> </xs:restriction> </xs:complexContent/> </xs:complexType>
#27
PSVI
PSVI - Post Schema Validation Information Set.
This is the results of validating a document against a schema.
The is very different than DTDs or Relax NG.
The schema processor annotates the information set with validation information.
This isn't in your book but it is in the XML Schema specification.
#28
PSVI Element Annotations
[validation context] - The element ancestor that has a global element declaration. This can be the current element.
[validity] - One of 'valid', 'invalid', or 'notKnown'.
[validation attempted] - One of 'full', 'none', or 'partial'. The value 'partial' means that some children have been validated but not all.
[element declaration] - The schema element declaration used to validate this element.
There is more but these are the important ones.
#29
PSVI Attribute Annotations
[validation context] - The element ancestor that has a global element declaration. This can be the current element.
[validity] - One of 'valid', 'invalid', or 'notKnown'.
[validation attempted] - One of 'full' or 'none'.
[attribute declaration] - The schema attribute declaration used to validate this element.
[schema specified] - One of 'infoset' or 'schema'.
And there is more here too!
#30
PSVI Example
There's a tool I wanted you all to have from Univ. of Edinburgh.
But it isn't working a 100%.
But here's a stylesheet that peeks at the PSVI and marks bad elements in red: valid.xsl
#31
Schema Components
Every major schema construct is mapped to a abstract component.
The specification says how these components can be used to accomplish validation.
The major components are:
element declaration
attribute declaration
simple and complex type definition
attribute group definition
model group definition.
schema
#32
Schema Information Component
There is one schema component regardless of how many namespaces you have.
Its a "consistent world view".
From this you can get all the definitions and declarations of everything.
There's a huge wealth of information in the components.
The recommendation gives you a standard interpretation of the schema.
#33
Element Declaration Component
This is the "resolved" declaration:
[name] - the local name of the element.
[target namespace] - the namespace name of the element.
[type definition] - the type definition component.
[scope] - Either 'global' or the type definition component in which the element was declared.
From this component, you can directly access the type used to validate the element.
There's more here too!
#34
Complex Type Definition Component
This is the "resolved" definition
[name] - the local name of the element.
[target namespace] - the namespace name of the element.
[base type definition] - the type definition component for the base type (if there is one).
From this component, you can directly access the type derivation chain.
And yet more here too!
#35
Components Example
The schema: citrus-fruits.xsd
The schema dump: citrus-fruits.sdump
A styled version: citrus-fruits.inventory.html