Basic ComplexType Derivation and Substitution Groups

#1

Complex Type Definitions - Re-useable Types

You've been declaring types inside the element declarations:

<xs:element name="person">
<xs:complexType>
  <xs:sequence>
     <xs:element name="name" type="xs:string"/>
     <xs:element name="age" type="xs:integer"/>
  </xs:sequence>
</xs:complexType>
</xs:element>

But you can name a complex at the top-level types:

<xs:complexType name="Person">
  <xs:sequence>
     <xs:element name="name" type="xs:string"/>
     <xs:element name="age" type="xs:integer"/>
  </xs:sequence>
</xs:complexType>

And then you point to them with elements:

<xs:element name="person" type="my:Person"/>
<xs:element name="student" type="my:Person"/>
<xs:element name="parent" type="my:Person"/>

Keep in mind that types can't exist in a document without an element declaration.

#2

Complex Type Derivation

Once you name your types you can use them to build other types.
You can derive types from other types.
Just like in OO-languages, the derived type is an instance of its super type.
But what does that mean for XML?

#3

"Valid against the Super Type"

Each parent-child relationship of the super type is preserved.
The order of the children of the super type is preserved.
The attributes of the super type is preserved.
If something is optional, its OK to disallow it or require it in the derived type.
If something occurs multiple times, it may be OK to restrict that occurrence.

#4

Extensions and Restrictions

From the previous slide, we are allowed to:
1. Add content to the end of the parent types' content.
2. Add attributes.
3. Disallow option elements/attributes.
4. Require option elements/attributes.
(1) and (2) are "extensions" - i.e. content has been added.
(3) and (4) are "restrictions" - i.e. content has been removed or restricted.

#5

Example

Each of these elements could have a super type:

<t:student id="s1">
<name>Jimmy</name>
</t:student>

<t:parent id="p1">
<name>Jimmy's Mom</name>
<address>
<street>123 Mars</street>
<city>San Francisco</city><state>CA</state>
</address>
<children>
<student ref="s1"/>
</children>
</t:parent>

<t:teacher>
<name>Dr. Evil</name>
<students>
<student ref="s1"/>
</students>
</t:teacher>

These could all have a base type of:

<xs:complexType name="PersonInfo">
   <xs:sequence>
      <xs:element name="name" type="xs:string"/>
   </xs:sequence>
   <xs:attribute name="id" type="xs:ID"/>
</xs:complexType>

#6

Example - Extension vs. Restriction

Schema gives you these derivation options for complex types:
- extension - added elements to the end of the children or adding attributes.
  
  Example: Teacher can add 'students' to the 'PersonInfo' type.
- restriction - removing optional elements or attributes.
  
  Example: Teacher can remove the optional 'id' attribute.
Simple types can only be restricted within their "value space":

Example: The integers 1 through 10.

#7

Simple Type Extension to a Complex Type

The 'simpleContent' element can be used to extend a simple type to add attributes.
The 'extension' element child contains some number of attribute declarations.
The content of the element remains typed as the simple type referenced via the 'base' attribute on 'extension'.

Example:

<xs:complexType name="Anchor">
   <xs:simpleContent>
     <xs:extension base="xs:string">
       <xs:attribute name="href" type="xs:anyURI"/>
     </xs:extension>
   </xs:simpleContent>
</xs:complexType>

There is a child call 'restriction' that can be used in place of 'extension'. That will be discussed in the lecture on simple types.

#8

Complex Type Extension - Adding Attributes

A complex type can be extended to add some number of attribute declarations via the 'complexContent' element.
The 'extension' element child contains some number of attribute declarations.
The content of the element is the same as the type referenced via the 'base' attribute on 'extension'.

Example:

<xs:complexType name="Person">
   <xs:sequence>
   <xs:element ref="my:name"/>
   <xs:element ref="my:major"/>
   </xs:sequence>
</xs:complexType>

<xs:complexType name="GradedPerson">
   <xs:complexContent>
      <xs:extension base="my:Person">
         <xs:attribute name="grade" type="xs:string"/>
      </xs:extension>
   </xs:complexContent>
</xs:complexType>

#9

Complex Type Extension - Adding Elements

A complex type can be extended to add some number of additional element children.
The 'extension' element child contains some number of content particles (e.g. element, sequence, etc.) before the attributes.
The content of the element is the base type's content plus the

Example:

<xs:complexType name="Person">
   <xs:sequence>
   <xs:element ref="my:name"/>
   <xs:element ref="my:major"/>
   </xs:sequence>
</xs:complexType>

<xs:complexType name="LocatedPerson">
   <xs:complexContent>
      <xs:extension base="my:Person">
         <xs:element ref="my:address"/>
      </xs:extension>
   </xs:complexContent>
</xs:complexType>

#10

Complex Type Extension - Adding Elements/Attributes

You can do this at the same time and as many times as you want:

<xs:complexType name="Person">
   <xs:sequence>
   <xs:element ref="my:name"/>
   <xs:element ref="my:major"/>
   </xs:sequence>
</xs:complexType>

<xs:complexType name="ExtendedPerson">
   <xs:complexContent>
      <xs:extension base="my:Person">
         <xs:element ref="my:address"/>
         <xs:element ref="my:extras"/>
         <xs:attribute name="grade" type="xs:string"/>
         <xs:attribute name="level" type="xs:string"/>
         <xs:attribute name="birth-date" type="xs:date"/>
      </xs:extension>
   </xs:complexContent>
</xs:complexType>

#11

What Happens to the Instance?

If elements are declared of the derived type, everthing is the same.
Just like in OO languages, we want some kind of polymorphic content so that derived type instances can be substituted for their super types.
That is, we should be able to use instances of 'ExtendedPerson' in place of instances of 'Person'.
XML Schema allows this but there are some strange consequences (which can be fixed).

#12

The Teacher/Parents/Students Example

We want want to model teachers who have students and the parents of the students.
Naively, we start with a nice extension hierarchy: teacher.xsd

The key bit is this part:

<xs:element name="district" type="my:District"/>
<xs:complexType name="District">
   <xs:sequence>
      <xs:element ref="my:person" maxOccurs="unbounded"/>
   </xs:sequence>
</xs:complexType>

<xs:element name="person" type="my:PersonInfo"/>
<xs:complexType name="PersonInfo">
   <xs:sequence>
      <xs:element name="name" type="xs:string"/>
   </xs:sequence>
</xs:complexType>

Here we reference the "base type" of "PersonInfo" via an element in our content model for the type 'District'.

Regardless of type, the element name must be 'my:person' so the elements student/person/teacher can't be used.

#13

Making the Content Work with xsi:type

We can make the content work by asserting the type and changing the content to match that new type.
The type must be derived from the type of the element allowed in that position.

For the teacher.xsd schema, it looks something like this:

<t:district xmlns:t='...'>
<t:person xsi:type='t:Student' id=''>...</t:person>
<t:person xsi:type='t:Teacher'>...</t:person>
</t:district>

The fully worked example is here.

#14

xsi:type Wholly Unsatisfactory

I don't know about you...
...but that looks awful strange and smells like a big hack.
And it is really broken for XSLT/XPath where you expect to match on 'student' or 'teacher'.

What we want is:

<t:district xmlns:t='...'>
<t:student id=''>...</t:student>
<t:teacher>...</t:teacher>
</t:district>

without changing the content model of 'district'!!!

#15

The Solution is Substitution Groups

A substitution group is a collection of elements that are allowed to be substituted for a particular element.
In the district example, we'll setup a substitution group for 'person' so that 'student', 'teacher', or 'parent' can appear as children of 'district'.

One way to think of this is as an extensible choice. The content model of 'district' could have been:

<xs:choice>
<xs:element ref="my:student"/>
<xs:element ref="my:parent"/>
<xs:element ref="my:teacher"/>
</xs:choice>

#16

The Mechanism of Substitution Groups

Substitution groups are established on the element declaration via the 'substitutionGroup' attribute.
The value is a QName of the element for which this element can be substituted.
The only restriction is that the element's type must be derived from the type of the substituted element. That is, the type of 'student' must be derived from the type of 'person' (which it is).
So, each of the elements student, parent, teacher in the schema should have a substitutionGroup attribute with a value of 'my:person'.
The fixed teacher.xsd should valid this content.

#17

Generalized Substitutions

You can make the representative of the substitution group be an element that you'll never use.
Making the base type be 'xs:anyType' effective makes the type derivation restriction go away.

Example:

<xs:element name="inline" type="xs:anyType"/>
<xs:element name="stop" substitutionGroup="my:inline">
<xs:complexType/>
</xs:element>

#18

Example - XHTML Blocks and Inlines

I've created my own XHTML schema that uses substitution groups.
This makes XHTML more extensible.
I have two generalized substitution groups for inlines and blocks.
Here's the schema: xhtml.xsd

#19

Example - Extending XHTML

With the inline/block substitution groups, I can extend XHTML and add my own elements:
- Here's the schema: pseudocode.xsd
- Here's the schema: mathml.xsd

#20

Abstract Elements

An abstract element cannot be put into an instance.
The are great for making the representative member of the substitution group something that can't be used.
For example, I don't want people to use my 'inline' element from my XHTML schema:
```
<xs:element name="inline" abstract="true"/>
```

#21

Substitution vs. xsi:type

A substitution group allows you to formally define an extension to a content model.
The xsi:type attribute allows ad-hoc extensions.
The xsi:type also allows room for unanticipated needs.
You can't change the element name with xsi:type but you can with substitution groups.
Substitution groups are better for tools (see XMLMind Demo).

#22

Demo: My Slides

These slides use substitution groups to allow more structured content in the slides.
I use this for the simple reason that it makes my slides extensible.
It also means I don't have to change my slide schema when I want to add new elements.

#23

Restricting Complex Types

When you restrict a type, you remove content possibilities.
But you can only:
- Remove optionality.
- Prohibit elements/attributes.
- Limit values (e.g. restrict them).

#24

Restricting Complex Types - Example #1

In restrictions, you repeat the content model with the restrictions in place. Here we have several optional parts:

<xs:complexType name="Product">
   <xs:sequence>
      <xs:element name="number" type="xs:string"/>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="size" type="my:Size" minOccurs="0"/>
      <xs:element name="color" type="my:Colors" minOccurs="0"/>
   </xs:sequence>
</xs:complexType>

Then we can restrict content to be products with a size element:

<!-- Forced to have a size -->
<xs:complexType name="ProductWithSize">
   <xs:complexContent>
   <xs:restriction base="my:Product">
   <xs:sequence>
      <xs:element name="number" type="xs:string"/>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="size" type="my:Size"/>
      <xs:element name="color" type="my:Colors" minOccurs="0"/>
   </xs:sequence>
   </xs:restriction>
   </xs:complexContent>
</xs:complexType>

#25

Restricting Complex Types - Example #2

We can also remove optional parts completely:

<xs:complexType name="Product">
   <xs:sequence>
      <xs:element name="number" type="xs:string"/>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="size" type="my:Size" minOccurs="0"/>
      <xs:element name="color" type="my:Colors" minOccurs="0"/>
   </xs:sequence>
</xs:complexType>

Here we restrict the product to the mimimal required parts:

<xs:complexType name="MinimalProduct">
   <xs:complexContent>
   <xs:restriction base="my:Product">
   <xs:sequence>
      <xs:element name="number" type="xs:string"/>
      <xs:element name="name" type="xs:string"/>
   </xs:sequence>
   </xs:restriction>
   </xs:complexContent>
</xs:complexType>

#26

Example: Biodiversity Data - Requirements

Imagine I'm trying to interchange biodiversity data.
Each organization has its own idea of what level of specificitiy they need to collect their data.
Some of that may be operational and others may be specific requirements from their user community.

#27

Example: Biodiversity Data - Using Substitution

We could imagine a base class for this information that contains:
- Species/Group Identification
- Location
- Area type/habitat type
- Counts or estimates of population
- Observations

From that we could construct the following complex type:

<xs:complexType name="BioDataEntry">
<xs:sequence>
<xs:element ref="b:species"/>
<xs:element ref="b:location"/>
<xs:element ref="b:areatype"/>
<xs:element ref="b:counts" minOccurs="0"/>
<xs:element ref="b:observations" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

How do we define the elements referred to above?

#28

Example: Biodiversity Data - The Twist

What happens when people don't agree on the what makes up each of those elements?
What happens when different groups have conflicting requirements for specificity?
What happens to existing data that isn't directly mappable?
For example, location could be:
- GIS coordinates
- Map coordinates
- Street address
- Region name
- A polygon on a map.
- Any of the above plus relative coordinates:
  - 10 paces north
  - a depth of 10 meters
  - from 10-20 meters in the canopy.

#29

Example: Biodiversity Data - The Substitution Group Solution

Assuming no common structure, we could define them to all be abstract elements:

<xs:element name="b:species" type="xs:anyType" abstract="true"/>
<xs:element name="b:location" type="xs:anyType" abstract="true"/>
<xs:element name="b:areatype" type="xs:anyType" abstract="true"/>
<xs:element name="b:counts" type="xs:anyType" abstract="true"/>
<xs:element name="b:observations" type="xs:anyType" abstract="true"/>

Then we define specific kinds of species/locations/etc. and put them in the substitution group:

<xs:element name="b:latin-name" substitutionGroup="b:species">
<xs:complexType>
<xs:attribute name="genus" type="xs:string" use="required"/>
<xs:attribute name="species" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>

<xs:element name="b:gis-location" substitutionGroup="b:location">
<xs:complexType>
</xs:complexType>
</xs:element>

#30

Example: Biodiversity Data - Why this solution?

Using a subsitution group in interchange allows the different databases & organizations to exchange data without everything being directly mappable.
Consortiums can define broad standards without restricting what data is encoded as the model is extensible.
The model is inherently ontology based information exchange.
Individuals/projects/etc. can decide to what level they will encode and exchange their information.

#31

Demo: Mathdoc Schemas

These are a set of schemas for structuring scientific content.
The substitution group is the main mechanism for making the content extensible.