Defining Simple Types
R. Alexander Milowski
milowski at sims.berkeley.edu
#1
Simple Types
Simple types fall into one of three categories:
atomic - an indivisible value (integer, word tokens, etc.)
lists - lists of atomic types
unions - unions of the values for atomic or list types..
All types are created--even the built-in types--by derivations amongst the above categories.
#2
Values
All simple types have:
value space - The set of values for a datatype.
lexical space - The set of literals (character strings) for the values in the value space.
facets - each facet is a single defining aspect of the value space (e.g minimum value, maximum value, etc.)
Simple types are created or derviated by restricting these aspects.
#3
Primitive vs. Derived
Definition 1:
A primitive datatype are those which are not defined in terms of other datatypes (i.e. ab initio).
Definition 2:
A derived datatype are those defined in terms of other datatypes.
Primitive types: string, boolean, decimal, float, double, duration, dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth, hexBinary, base64Binary, anyURI, QName, NOTATION
#4
Fundamental Facets
A fundamental facet is an abstract property that characterizes a value space.
Keep in mind that this is a facet of the value space and not the lexical space
These facets are:
equal - every value space has the notion of equality.
ordered - a value space may have no order, partial order, or total order
bounded - has upper and lower bounds.
cardinality - finite or countable infinity (XML Schema is missing uncountable)
numeric - a boolean indicating whether it represents a quantity or mathematical number system
#5
Constraining Facets
A constraining facet is an optional property than be applied to constrain a value space.
Sometimes this constrain will constrain the lexical space
For example, to restrict to the even integers between 1 and 100:
Use the minInclusive/maxInclusive to restrict the range to 2-100
Use a regular expression to restrict the last digit to 0, 2, 4, 6, or 8.
The next few slides will go through the constraining facets.
#6
Type Derivation by Restricting
You derive new simple types by restricting existing types.
The basic structure is:
<xs:simpleType name="sometype"> <xs:restriction base="parent"> ... </xs:restriction> </xs:simpleType>
The 'base' attribute points to the type you are restricting.
All values must be valid values in the base types' value space.
Inside the 'restriction' element you place your restrictions!
#7
Range Restrictions
The use these elements:
minInclusive - a minimum where the boundary value can be used.
maxInclusive - a maximum where the boundary value can be used.
minExclusive - a minimum where the boundary value can not be used.
minExclusive - a minimum where the boundary value can not be used.
The maximum must always be bigger than the minimum.
The integers between 2 and 5:
<xs:simpleType name="score"> <xs:restriction base="xs:integer"> <xs:minInclusive value="2"/> <xs:maxInclusive value="5"/> </xs:restriction> </xs:simpleType>
#8
Length Restrictions
The use these elements:
length - the exact length of the lexical value.
minLength - the minimum length of the lexical value.
maxLength - the maximum length of the lexical value.
The maximum must always be bigger than the minimum.
A string with two characters
<xs:simpleType name="code"> <xs:restriction base="xs:integer"> <xs:length value="2"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="name"> <xs:restriction base="xs:string"> <xs:minLength value="2"/> <xs:maxLength value="50"/> </xs:restriction> </xs:simpleType>
#9
Digit Restrictions
This only applies to types that derive from 'Decimal'.
The use these elements:
totalDigits - the maximum number of digits (including fractional parts).
fractionDigits - the maximum number of fractional digits.
The fractional digits must be less than the total digits.
A money amount less than $1 million.
<xs:simpleType name="amount"> <xs:restriction base="xs:decimal"> <xs:totalDigits value='8'/> <xs:fractionDigits value='2'/> </xs:restriction> </xs:simpleType>
#10
Whitespace Restrictions
The 'whiteSpace' element controls the processing of whitespace according to these values:
preserve - all whitespace is preserved.
replace - replaces tabs, line feeds, and carriage returns with spaces.
collapse - removes trailing and leading spaces and replaces multiple whitespace characters with a single space.
Ensuring a type name doesn't have whitespace:
<xs:simpleType name="TypeName"> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> </xs:restriction> </xs:simpleType>
#11
Enumerated Value Restrictions
The 'enumeration' element specifies an exact value.
You can have more than one of these to specify different values.
The enumeration value must be an instance of the base type.
A status keyword enumeration:
<xs:simpleType name="Status"> <xs:restriction base="xs:string"> <xs:enumeration value="draft"/> <xs:enumeration value="last call"/> <xs:enumeration value="proposed recomendation"/> <xs:enumeration value="unknown"/> </xs:restriction> </xs:simpleType>
#12
Combining Restrictions
You can combine restrictions to make complex values:
<xs:simpleType name="Status"> <xs:restriction base="xs:integer"> <xs:length value="2"/> <xs:enumeration value="23"/> <xs:enumeration value="29"/> <xs:enumeration value="31"/> <xs:enumeration value="37"/> </xs:restriction> </xs:simpleType>
#13
Pattern Restrictions
The 'pattern' element specifes a regular expression for the lexical value.
The book has a whole section on patterns. Read it!!!
If you are familiar with regular expressions, then you'll feel at home.
The syntax is familiar to the java.util.regex package.
A US Zip+4 code:
<xs:simpleType name="ZipPlus4"> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{5}(-[0-9]{4})?"/> </xs:restriction> </xs:simpleType>
#14
Regular Expressions - Quick Tour
A regular expression consists of a set of branchs:
pattern1 | pattern2 | pattern3
A branch is a sequence of atoms--each of which can have a quantifier.
An atom is:
A character 'a' or character class '[a-z]'.
A parenthesized regular expression: (a|b)
An escape (e.g. \d or \p{Is-Basic-Latin})
#15
Regular Expressions - Character Classes & Escapes
Special syntax characters like tabs or parathesises can be embedded by escapes.
For example, the tab character is \t and a left-parens is \(
Certain classes of characters are escapes:
\d is any decimal digit and \D is any non-decimal digit
\s is a whitespace character and \S is any non-whitespace character.
\w is any character not considered punctuation, separators, or other by Unicode and \W is the opposite.
See page 187-188 for more information.
#16
Regular Expressions - Block Escapes
You can refer to Unicode code pages by name too.
Two forms:
\p{IsXXX} - any character from the code page 'XXX'
\P{IsXXX} - any character not from the code page 'XXX'
Example:
\p{IsGreek} - any greek character
\P{IsMathematicalOperators} - any non-math operator character.
See page 190-195 for more information.
#17
Regular Expressions - Quantifiers
Any atom can have a quantifier as a suffix.
Basic forms
? - optional (0 or 1 times).
* - zero or more times.
+ - one or more times.
Ranges
{n} - exactly n times.
{n} - n or more times.
{n,m} - between n and m times--inclusive.
#18
Regular Expressions - Examples
A US zip code: \d{5}-(\d{4})?
A US social security number: \d{3}-\d{2}-\d{4}
An street with a house number first: \d+\s+\S.*
#19
Atomic Values and Types
An atomic value is that which isn't divided by whitespace (e.g. "token", "10", "true").
Types are simple types are atomic if they aren't lists or unions.
This becomes important when you have lists of values:
A list of integers:
1 2 3 4 5 6 7 8 9 10
This has ten values (duh!).
A list of strings:
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
This is a list of 18 strings--not two or one.
#20
List Simple Types
A list simple type specifies a list of values from an atomic type.
A value conforming must have at least one value in the list (no empty lists).
Example:
<xs:simpleType name="IntegerVector"> <xs:list itemType="xs:integer"/> </xs:simpleType>
An instance:
10 -2 5 12 3
#21
List Simple Types - Inlining Types
You can define the type in the list.
Example:
<xs:simpleType name="MenShoeSizesUS"> <xs:list> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="7"/> <xs:maxInclusive value="14"/> </xs:restriction> </xs:simpleType> </xs:list> </xs:simpleType>
An instance:
9 10 14
#22
Union Simple Types
A union simple type specifies a union of atomic simple type values.
Example:
<xs:simpleType name="size"> <xs:union memberTypes="xs:integer"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="small"/> <xs:enumeration value="medium"/> <xs:enumeration value="large"/> </xs:restriction> </xs:simpleType> </xs:union> </xs:simpleType>
#23
String-based Built-ins
xs:string - has whiteSpace facet value of 'preserve'.
xs:normalizeString - has whiteSpace facet value of 'replace'.
xs:token - has whiteSpace facet value of 'collapse'.
Examples:
xs:string | xs:normalizeString | xs:token |
---|---|---|
leading trailing and inter-word. |
leading trailing and inter-word. |
leading trailing and inter-word. |
#24
Unbounded Integer Built-ins
xs:integer - derived from decimal by setting fractionDigits to '0'.
xs:nonPositiveInteger - derived from xs:integer by setting maxInclusive to '0'.
xs:negativeInteger - derived from xs:nonPositive by setting maxInclusive to '-1'.
xs:nonNegativeInteger - derived from xs:integer by setting minInclusive to '0'.
xs:positiveInteger - derived from xs:nonNegativeInteger by setting minInclusive to '1'.
#25
Bounded Integer Built-ins
xs:long - derived from xs:integer by setting maxInclusive to 9223372036854775807 and minInclusive to -9223372036854775808.
xs:int - derived from xs:long by setting maxInclusive to 2147483647 and minInclusive to -2147483648.
xs:short - derived from xs:int by setting maxInclusive to 32767 and minInclusive to -32768.
xs:byte - derived from xs:short by setting maxInclusive to 127 and minInclusive to -127
#26
Bounded Unsigned Integer Built-ins
xs:unsignedLong - derived from xs:nonNegativeInteger by setting maxInclusive to 18446744073709551615.
xs:unsignedInt - derived from xs:unsignedLong by setting maxInclusive to 4294967295.
xs:unsignedShort - derived from xs:unsignedInt by setting maxInclusive to 65535.
xs:unsignedByte - derived from xs:unsignedShort by setting maxInclusive to 255.
#27
Floating Point
#28
Miscellaneous Types
xs:boolean - Values are 'true' and 'false'.
xs:anyURI - A URI as defined in [RFC 2396] and as amended by [RFC 2732]
xs:hexBinary - represents arbitrary hex-encoded binary data (e.g. 0FB7).
xs:base64Binary - binary data encoded in base64 as specified in [RFC 2045]
#29
Dates, Times, and Durations
Durations are periods of time without being anchored to a specified date or time.
Dates are with respect to the Gregorian calendar.
All times are in Coordinated Universal Time (UTC, sometimes called "Greenwich Mean Time").
#30
Dates & Times
xs:date - an ISO 8601 date:
A prefix of the minus sign ('-') mean "BC" (before common era).
The common format is 'YYYY-MM-DD'.
You can add a time to indicate the "start of the day"--which looks just like the dateTime
A date instance is constrained just line xs:dateTime
xs:dateTime - an ISO 8601 date plus a time.
The date is separated from the time by a 'T': 2005-03-10T02:00:00-08:00
The time format is 'hh:mm:ss' with at optional fractional seconds and optional timezone offset.
The timezone is Greenwich Mean Time unless you specify the offset.
The constraining facets are: pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
#31
Durations
xs:duration - an ISO 8601 duration
The syntax is PnYnMnDTnHnMnS where each 'n' is a number.
Y - years, M - months, D - days, H - hours, M - minutes, S - seconds.
A minus sign prefix signifies a "negatuve duration".
You can omit fields but if you mix years, months, or days with times, you need the 'T' separator.
You need the 'T' to signify time periods.
Examples:
1 month: P1M
2 days, 3 hours: P2DT3H
30 months: P30M
30 minutes: PT30M
1 hour 30 minutes: PT1H30M
See answers.com for more information on ISO 8601.
#32
Date Parts
xs:gYearMonth - Lexical format YYYY-MM
xs:gYear- Lexical format YYYY
xs:gMonthDay- Lexical format MM-DD
xs:gMonth- Lexical format MM
xs:gDay- Lexical format DD