Defining Simple Types

#1

Simple Types

• Simple types fall into one of three categories:

• atomic - an indivisible value (integer, word tokens, etc.)

• lists - lists of atomic types

• unions - unions of the values for atomic or list types..

• All types are created--even the built-in types--by derivations amongst the above categories.

#2

Values

• All simple types have:

• value space - The set of values for a datatype.

• lexical space - The set of literals (character strings) for the values in the value space.

• facets - each facet is a single defining aspect of the value space (e.g minimum value, maximum value, etc.)

• Simple types are created or derviated by restricting these aspects.

#3

Primitive vs. Derived

Definition 1:

A primitive datatype are those which are not defined in terms of other datatypes (i.e. ab initio).

Definition 2:

A derived datatype are those defined in terms of other datatypes.

Primitive types: string, boolean, decimal, float, double, duration, dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth, hexBinary, base64Binary, anyURI, QName, NOTATION

#4

Fundamental Facets

• A fundamental facet is an abstract property that characterizes a value space.

• Keep in mind that this is a facet of the value space and not the lexical space

• These facets are:

• equal - every value space has the notion of equality.

• ordered - a value space may have no order, partial order, or total order

• bounded - has upper and lower bounds.

• cardinality - finite or countable infinity (XML Schema is missing uncountable)

• numeric - a boolean indicating whether it represents a quantity or mathematical number system

#5

Constraining Facets

• A constraining facet is an optional property than be applied to constrain a value space.

• Sometimes this constrain will constrain the lexical space

• For example, to restrict to the even integers between 1 and 100:

• Use the minInclusive/maxInclusive to restrict the range to 2-100

• Use a regular expression to restrict the last digit to 0, 2, 4, 6, or 8.

• The next few slides will go through the constraining facets.

#6

Type Derivation by Restricting

• You derive new simple types by restricting existing types.

• The basic structure is:

```<xs:simpleType name="sometype">
<xs:restriction base="parent">
...
</xs:restriction>
</xs:simpleType>```
• The 'base' attribute points to the type you are restricting.

• All values must be valid values in the base types' value space.

• Inside the 'restriction' element you place your restrictions!

#7

Range Restrictions

• The use these elements:

• minInclusive - a minimum where the boundary value can be used.

• maxInclusive - a maximum where the boundary value can be used.

• minExclusive - a minimum where the boundary value can not be used.

• minExclusive - a minimum where the boundary value can not be used.

• The maximum must always be bigger than the minimum.

• The integers between 2 and 5:

```<xs:simpleType name="score">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="5"/>
</xs:restriction>
</xs:simpleType>```

#8

Length Restrictions

• The use these elements:

• length - the exact length of the lexical value.

• minLength - the minimum length of the lexical value.

• maxLength - the maximum length of the lexical value.

• The maximum must always be bigger than the minimum.

• A string with two characters

```<xs:simpleType name="code">
<xs:restriction base="xs:integer">
<xs:length value="2"/>
</xs:restriction>
</xs:simpleType>

<xs:simpleType name="name">
<xs:restriction base="xs:string">
<xs:minLength value="2"/>
<xs:maxLength value="50"/>
</xs:restriction>
</xs:simpleType>```

#9

Digit Restrictions

• This only applies to types that derive from 'Decimal'.

• The use these elements:

• totalDigits - the maximum number of digits (including fractional parts).

• fractionDigits - the maximum number of fractional digits.

• The fractional digits must be less than the total digits.

• A money amount less than \$1 million.

```<xs:simpleType name="amount">
<xs:restriction base="xs:decimal">
<xs:totalDigits value='8'/>
<xs:fractionDigits value='2'/>
</xs:restriction>
</xs:simpleType>```

#10

Whitespace Restrictions

• The 'whiteSpace' element controls the processing of whitespace according to these values:

• preserve - all whitespace is preserved.

• replace - replaces tabs, line feeds, and carriage returns with spaces.

• collapse - removes trailing and leading spaces and replaces multiple whitespace characters with a single space.

• Ensuring a type name doesn't have whitespace:

```<xs:simpleType name="TypeName">
<xs:restriction base="xs:string">
<xs:whiteSpace value="collapse"/>
</xs:restriction>
</xs:simpleType>```

#11

Enumerated Value Restrictions

• The 'enumeration' element specifies an exact value.

• You can have more than one of these to specify different values.

• The enumeration value must be an instance of the base type.

• A status keyword enumeration:

```<xs:simpleType name="Status">
<xs:restriction base="xs:string">
<xs:enumeration value="draft"/>
<xs:enumeration value="last call"/>
<xs:enumeration value="proposed recomendation"/>
<xs:enumeration value="unknown"/>
</xs:restriction>
</xs:simpleType>```

#12

Combining Restrictions

• You can combine restrictions to make complex values:

```<xs:simpleType name="Status">
<xs:restriction base="xs:integer">
<xs:length value="2"/>
<xs:enumeration value="23"/>
<xs:enumeration value="29"/>
<xs:enumeration value="31"/>
<xs:enumeration value="37"/>
</xs:restriction>
</xs:simpleType>```

#13

Pattern Restrictions

• The 'pattern' element specifes a regular expression for the lexical value.

• The book has a whole section on patterns. Read it!!!

• If you are familiar with regular expressions, then you'll feel at home.

• The syntax is familiar to the java.util.regex package.

• A US Zip+4 code:

```<xs:simpleType name="ZipPlus4">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{5}(-[0-9]{4})?"/>
</xs:restriction>
</xs:simpleType>```

#14

Regular Expressions - Quick Tour

• A regular expression consists of a set of branchs:

`pattern1 | pattern2 | pattern3 `
• A branch is a sequence of atoms--each of which can have a quantifier.

• An atom is:

• A character 'a' or character class '[a-z]'.

• A parenthesized regular expression: (a|b)

• An escape (e.g. \d or \p{Is-Basic-Latin})

#15

Regular Expressions - Character Classes & Escapes

• Special syntax characters like tabs or parathesises can be embedded by escapes.

• For example, the tab character is \t and a left-parens is \(

• Certain classes of characters are escapes:

• \d is any decimal digit and \D is any non-decimal digit

• \s is a whitespace character and \S is any non-whitespace character.

• \w is any character not considered punctuation, separators, or other by Unicode and \W is the opposite.

• See page 187-188 for more information.

#16

Regular Expressions - Block Escapes

• You can refer to Unicode code pages by name too.

• Two forms:

• \p{IsXXX} - any character from the code page 'XXX'

• \P{IsXXX} - any character not from the code page 'XXX'

• Example:

• \p{IsGreek} - any greek character

• \P{IsMathematicalOperators} - any non-math operator character.

• See page 190-195 for more information.

#17

Regular Expressions - Quantifiers

• Any atom can have a quantifier as a suffix.

• Basic forms

• ? - optional (0 or 1 times).

• * - zero or more times.

• + - one or more times.

• Ranges

• {n} - exactly n times.

• {n} - n or more times.

• {n,m} - between n and m times--inclusive.

#18

Regular Expressions - Examples

• A US zip code: \d{5}-(\d{4})?

• A US social security number: \d{3}-\d{2}-\d{4}

• An street with a house number first: \d+\s+\S.*

#19

Atomic Values and Types

• An atomic value is that which isn't divided by whitespace (e.g. "token", "10", "true").

• Types are simple types are atomic if they aren't lists or unions.

• This becomes important when you have lists of values:

• A list of integers:

`1 2 3 4 5 6 7 8 9 10`

This has ten values (duh!).

• A list of strings:

```The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.```

This is a list of 18 strings--not two or one.

#20

List Simple Types

• A list simple type specifies a list of values from an atomic type.

• A value conforming must have at least one value in the list (no empty lists).

• Example:

```<xs:simpleType name="IntegerVector">
<xs:list itemType="xs:integer"/>
</xs:simpleType>```
• An instance:

`10 -2 5 12 3`

#21

List Simple Types - Inlining Types

• You can define the type in the list.

• Example:

```<xs:simpleType name="MenShoeSizesUS">
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="7"/>
<xs:maxInclusive value="14"/>
</xs:restriction>
</xs:simpleType>
</xs:list>
</xs:simpleType>```
• An instance:

`9 10 14`

#22

Union Simple Types

• A union simple type specifies a union of atomic simple type values.

• Example:

```<xs:simpleType name="size">
<xs:union memberTypes="xs:integer">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>```

#23

String-based Built-ins

• xs:string - has whiteSpace facet value of 'preserve'.

• xs:normalizeString - has whiteSpace facet value of 'replace'.

• xs:token - has whiteSpace facet value of 'collapse'.

• Examples:

xs:string xs:normalizeString xs:token
```  leading
trailing   and inter-word. ```
`  leading trailing   and inter-word. `
`leading trailing and inter-word.`

#24

Unbounded Integer Built-ins

#25

Bounded Integer Built-ins

• xs:long - derived from xs:integer by setting maxInclusive to 9223372036854775807 and minInclusive to -9223372036854775808.

• xs:int - derived from xs:long by setting maxInclusive to 2147483647 and minInclusive to -2147483648.

• xs:short - derived from xs:int by setting maxInclusive to 32767 and minInclusive to -32768.

• xs:byte - derived from xs:short by setting maxInclusive to 127 and minInclusive to -127

#26

Bounded Unsigned Integer Built-ins

• xs:unsignedLong - derived from xs:nonNegativeInteger by setting maxInclusive to 18446744073709551615.

• xs:unsignedInt - derived from xs:unsignedLong by setting maxInclusive to 4294967295.

• xs:unsignedShort - derived from xs:unsignedInt by setting maxInclusive to 65535.

• xs:unsignedByte - derived from xs:unsignedShort by setting maxInclusive to 255.

#27

Floating Point

• xs:float - IEEE single-precision 32-bit floating point number.

• xs:double - IEEE double-precision 64-bit floating point

#28

Miscellaneous Types

• xs:boolean - Values are 'true' and 'false'.

• xs:anyURI - A URI as defined in [RFC 2396] and as amended by [RFC 2732]

• xs:hexBinary - represents arbitrary hex-encoded binary data (e.g. 0FB7).

• xs:base64Binary - binary data encoded in base64 as specified in [RFC 2045]

#29

Dates, Times, and Durations

• Durations are periods of time without being anchored to a specified date or time.

• Dates are with respect to the Gregorian calendar.

• All times are in Coordinated Universal Time (UTC, sometimes called "Greenwich Mean Time").

#30

Dates & Times

• xs:date - an ISO 8601 date:

• A prefix of the minus sign ('-') mean "BC" (before common era).

• The common format is 'YYYY-MM-DD'.

• You can add a time to indicate the "start of the day"--which looks just like the dateTime

• A date instance is constrained just line xs:dateTime

• xs:dateTime - an ISO 8601 date plus a time.

• The date is separated from the time by a 'T': 2005-03-10T02:00:00-08:00

• The time format is 'hh:mm:ss' with at optional fractional seconds and optional timezone offset.

• The timezone is Greenwich Mean Time unless you specify the offset.

• The constraining facets are: pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive

#31

Durations

• xs:duration - an ISO 8601 duration

• The syntax is PnYnMnDTnHnMnS where each 'n' is a number.

• Y - years, M - months, D - days, H - hours, M - minutes, S - seconds.

• A minus sign prefix signifies a "negatuve duration".

• You can omit fields but if you mix years, months, or days with times, you need the 'T' separator.

• You need the 'T' to signify time periods.

• Examples:

• 1 month: P1M

• 2 days, 3 hours: P2DT3H

• 30 months: P30M

• 30 minutes: PT30M

• 1 hour 30 minutes: PT1H30M