More XML Schema!
Composition, Documentation, Re-usable Types, Simple Type Derivation, and the PSVI
R. Alexander Milowski
milowski@sims.berkeley.edu
School of Information Management and Systems
#1
Re-usable Complex Types
You can name a complex at the top-level types:
<xs:complexType name="person"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="age" type="xs:integer"/> </xs:sequence> </xs:complexType>
And then you point to them with elements:
<xs:element name="person" type="my:person"/> <xs:element name="student" type="my:person"/> <xs:element name="parent" type="my:person"/>
#2
Only Elements are in the Document
Event though we had referred to the type:
<xs:element name="student" type="my:person"/> <xs:element name="parent" type="my:person"/>
The elements don't use the type's name:
<my:student> <name>Jimmy Smith</name> <age>2</age> </my:student> <my:parent> <name>Jane Smith</name> <age>33</age> </my:parent>
#3
Schema Composition
Schema provides two facilities for modularity:
xs:include - includes a set of definitions/declarations as if they were inlined (cut-n-paste).
xs:import - imports a different namespace so you can use its definitions/declarations.
#4
xs:include
The 'include' element occurs at the top level.
Example:
<xs:include schemaLocation="some-module.xsd"/> <xs:include schemaLocation="http://cde.berkeley.edu/schemas/chunk-of-something.xsd"/>
The target namespace of the included schema must:
match the current target namespace (i.e. the targetNamespace attributes have the same value).
have no value (e.g. no targetNamespace attribute).
All top-level elements become part of the current target namespace.
#5
xs:import
The 'import' element occurs at the top level.
Example:
<xs:import namespace="urn:publicid:IDN+cde.berkeley.edu:schemas:something:us" schemaLocation="something.xsd"/> <xs:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="xhtml.xsd"/>
The 'namespace' attribute specifies what namespace you are attempting to import.
The 'schemaLocation' attribute is not required, but recommended.
You still have to define a prefix to refer to definitions/declarations.
#6
Example of Import
This schema imports a schema for XHTML.
It then refers to the element declaration for the 'p' element.
Example:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:publicid:IDN+cde.berkeley.edu:schemas:examples:snippet:us" xmlns:h="http://www.w3.org/1999/xhtml" > <xs:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="xhtml.xsd"/> <xs:element name="snippet"> <xs:complexType> <xs:sequence> <xs:element ref="h:p" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
#7
Import Chains
Imported schemas can import schemas which import schemas which...
Declarations/Definitions belong to target namespace of where the occur (i.e. the value of the targetNamespace attribute nearest to them).
You don't have specify where imports of imported schemas are located.
At minimum you need to import the namespace of all definitions/declarations used.
#8
Three Simple Rules for the Instance
Global declarations require qualified name in the instance.
Local declarations are unqualified unless you specifically say otherwise.
If your declaration requires a qualified name, the namespace is the value of the 'targetNamespace' of the document where the declaration occurs.
#9
Import Chains Example
This should work in all processors:
Schema A imports schema B.
Schema B imports schema C.
Schema A uses a declaration from schema C.
Example:
Schema A: ichain-1.xsd
Schema B: ichain-2.xsd
Schema C: ichain-3.xsd
Instance: ichain.xml
#10
Import Chains and the Instance
What happens when your imports have imports?
Declarations are bound to the target namespace of where they occur.
Unqualified local element declarations mean no namespace for the element in the instance.
Example:
Base Schema: base.xsd
Importing Schema: importer.xsd
Conforming Document: importer.xml
#11
Local Elements and Importing
Unqualified local elements have no namespace name.
So, even though their types are in different namespaces, the elements aren't.
Example:
Type Library: type-library.xsd
Document Schema: person-location.xsd
Conforming Document: person-location.xml
#12
A Rainbow of Namespaces
If you qualify all your local elements, you may end up with many different namespaces in your instance.
Having deep schema derivations makes this worse.
Having five namespaces isn't unrealistic, but this example is silly:
Five levels of derivation: level1.xsd level2.xsd level3.xsd level4.xsd level5.xsd
Conforming Document: level.xml
#13
Qualifying All Local Elements
You can control qualifying local element declarations at the schema level.
Add the attribute 'elementFormDefault' to the 'xs:schema' element.
This specifies a semantic default for the 'form' attribute of a local element declaration.
For example, this qualifies all local element declarations:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://..." elementFormDefault="qualified" > ... </xs:schema>
#14
Simple Type Derivation
Simple type derivations fall into one of three categories:
value restriction - restricting the values of an existing type (e.g. the numbers between 2 and 5)
lists - lists of a specific type (e.g. a list of dates).
unions - lists of a specific type (e.g. a list of dates).
You can restrict built-in simple types or your own types.
#15
Restricting Values
The basic structure is:
<xs:simpleType name="sometype"> <xs:restriction base="parent"> ... </xs:restriction> </xs:simpleType>
The 'base' attribute points to the type you are restricting.
All values must be valid values in the base types' value space.
Inside the 'restriction' element you place your restrictions!
All restrictions are elements with a 'value' attribute.
#16
Range Restrictions
The use these elements:
minInclusive - a minimum where the boundary value can be used.
maxInclusive - a maximum where the boundary value can be used.
minExclusive - a minimum where the boundary value can not be used.
minExclusive - a minimum where the boundary value can not be used.
The maximum must always be bigger than the minimum.
The integers between 2 and 5:
<xs:simpleType name="score"> <xs:restriction base="xs:integer"> <xs:minInclusive value="2"/> <xs:maxInclusive value="5"/> </xs:restriction> </xs:simpleType>
#17
Length Restrictions
The use these elements:
length - the exact length of the lexical value.
minLength - the minimum length of the lexical value.
maxLength - the maximum length of the lexical value.
The maximum must always be bigger than the minimum.
A string with two characters
<xs:simpleType name="code"> <xs:restriction base="xs:string"> <xs:length value="2"/> </xs:restriction> </xs:simpleType>
#18
Digit Restrictions
This only applies to types that derive from 'Decimal'.
The use these elements:
totalDigits - the maximum number of digits (including fractional parts).
fractionDigits - the maximum number of fractional digits.
The fractional digits must be less than the total digits.
A money amount less than $1 million.
<xs:simpleType name="amount"> <xs:restriction base="xs:decimal"> <xs:totalDigits value='8'/> <xs:fractionDigits value='2'/> </xs:restriction> </xs:simpleType>
#19
Whitespace Restrictions
The 'whiteSpace' element controls the processing of whitespace according to these values:
preserve - all whitespace is preserved.
replace - replaces tabs, line feeds, and carriage returns with spaces.
collapse - removes trailing and leading spaces and replaces multiple whitespace characters with a single space.
Ensuring a type name doesn't have whitespace:
<xs:simpleType name="TypeName"> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> </xs:restriction> </xs:simpleType>
#20
Enumerated Value Restrictions
The 'enumeration' element specifies an exact value.
You can have more than one of these to specify different values.
The enumeration value must be an instance of the base type.
A status keyword enumeration:
<xs:simpleType name="Status"> <xs:restriction base="xs:string"> <xs:enumeration value="draft"/> <xs:enumeration value="last call"/> <xs:enumeration value="proposed recomendation"/> <xs:enumeration value="unknown"/> </xs:restriction> </xs:simpleType>
#21
Pattern Restrictions
The 'pattern' element specifes a regular expression for the lexical value.
The book has a whole section on patterns. Read it!!!
If you are familiar with regular expressions, then you'll feel at home.
The syntax is familiar to the java.util.regex package.
A US Zip+4 code:
<xs:simpleType name="ZipPlus4"> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{5}(-[0-9]{4})?"/> </xs:restriction> </xs:simpleType>
#22
Atomic Values and Types
An atomic value is that which isn't divided by whitespace (e.g. "token", "10", "true").
Types are simple types are atomic if they aren't lists or unions.
This becomes important when you have lists of values:
A list of integers:
1 2 3 4 5 6 7 8 9 10
This has ten values (duh!).
A list of strings:
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
This is a list of 18 strings--not two or one.
#23
List Simple Types
A list simple type specifies a list of values from an atomic type.
A value conforming must have at least one value in the list (no empty lists).
Example:
<xs:simpleType name="IntegerVector"> <xs:list itemType="xs:integer"/> </xs:simpleType>
An instance:
10 -2 5 12 3
#24
List Simple Types - Inlining Types
You can define the type in the list.
Example:
<xs:simpleType name="MenShoeSizesUS"> <xs:list> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="7"/> <xs:maxInclusive value="14"/> </xs:restriction> </xs:simpleType> </xs:list> </xs:simpleType>
An instance:
9 10 14
#25
Union Simple Types
A union simple type specifies a union of atomic simple type values.
Example:
<xs:simpleType name="size"> <xs:union memberTypes="xs:integer"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="small"/> <xs:enumeration value="medium"/> <xs:enumeration value="large"/> </xs:restriction> </xs:simpleType> </xs:union> </xs:simpleType>
#26
Documentation
You can document your schemas in an XML syntax with the 'annotation' element.
It has two element children:
xs:appinfo - typically for addition application information (e.g. relational tables, constraint rules, java method mappings).
xs:documentation - intended to contain the human-readable documentation for the construct.
You can put anything inside these elements (e.g. XHTML).
Example:
<xs:annotation> <xs:appinfo><java:method name="setBingo"/></appinfo> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This element represents a bingo game.</p> </xs:documentation> </xs:annotation>
#27
Documenting an Element
Example:
<xs:element name="person-name"> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This element represents a person's name. The middle name is optional.</p> </xs:documentation> </xs:annotation> ... </xs:element>
#28
Documenting a Type
Example:
<xs:complexType name="XHTMLBlockContents"> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This type represents a container of XHTML block elements.</p> </xs:documentation> </xs:annotation> ... </xs:complexType>
#29
Documenting an Attribute
Example:
<xs:attribute name="src" type="xs:anyURI"> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This attribute is a link to a image object. The value can be a relative URL.</p> </xs:documentation> </xs:annotation> </xs:attribute>
#30
Fully Document Your Schemata
If you're "good", you'll do this.
...but we're not always "good",... like those comments in your Java code, right?
Example:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="..." > <!-- there is one main information annotation for the schema namespace --> <xs:annotation> <xs:documentation xmlns="http://www.w3.org/1999/xhtml"> <p>This schema is used for...</p> </xs:documentation> </xs:annotation> <!-- every element has an annotation --> <xs:element name="..."> <xs:annotation> ... </xs:annotation> ... </xs:element> <!-- every type has an annotation --> <xs:complexType name="..."> <xs:annotation> </xs:annotaiton> ... </xs:complexType> <!-- etc. --> </xs:schema>
#31
PSVI
PSVI - Post Schema Validation Information Set.
This is the results of validating a document against a schema.
The is very different than DTDs or Relax NG.
The schema processor annotates the information set with validation information.
This isn't in your book but it is in the XML Schema specification.
#32
PSVI Element Annotations
[validation context] - The element ancestor that has a global element declaration. This can be the current element.
[validity] - One of 'valid', 'invalid', or 'notKnown'.
[validation attempted] - One of 'full', 'none', or 'partial'. The value 'partial' means that some children have been validated but not all.
[element declaration] - The schema element declaration used to validate this element.
There is more but these are the important ones.
#33
PSVI Attribute Annotations
[validation context] - The element ancestor that has a global element declaration. This can be the current element.
[validity] - One of 'valid', 'invalid', or 'notKnown'.
[validation attempted] - One of 'full' or 'none'.
[attribute declaration] - The schema attribute declaration used to validate this element.
[schema specified] - One of 'infoset' or 'schema'.
And there is more here too!
#34
Schema Components
Every major schema construct is mapped to a abstract component.
The specification says how these components can be used to accomplish validation.
The major components are:
element declaration
attribute declaration
simple and complex type definition
attribute group definition
model group definition.
schema
#35
Schema Information Component
There is one schema component regardless of how many namespaces you have.
Its a "consistent world view".
From this you can get all the definitions and declarations of everything.
There's a huge wealth of information in the components.
The recommendation gives you a standard interpretation of the schema.
#36
Element Declaration Component
This is the "resolved" declaration:
[name] - the local name of the element.
[target namespace] - the namespace name of the element.
[type definition] - the type definition component.
[scope] - Either 'global' or the type definition component in which the element was declared.
From this component, you can directly access the type used to validate the element.
There's more here too!
#37
Complex Type Definition Component
This is the "resolved" definition
[name] - the local name of the element.
[target namespace] - the namespace name of the element.
[base type definition] - the type definition component for the base type (if there is one).
From this component, you can directly access the type derivation chain.
And yet more here too!