XML and the Web

R. Alexander Miłowski


School of Information, UC Berkeley

What is XML?

XML is a syntax. HTML is a vocabulary.

XML is:

Some Well-Known XML Vocabularies

A Complete Example

The source document.

<?xml version="1.0" encoding="UTF-8"?>
<article xmlns="http://docbook.org/ns/docbook"
    <title>A Sample Document</title>
        <title>Some Mathematics</title>
        <para>Here is an unknown equation: <equation>
            <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block">
                        <mml:mi mathvariant="bold">a</mml:mi>
                              <mml:mi mathvariant="bold">F</mml:mi>
                              <mml:mi mathvariant="bold">E</mml:mi>
                                    <mml:mi mathvariant="bold">v</mml:mi>
                                    <mml:mi mathvariant="bold">B</mml:mi>

An More Complicated Example

An extract from the US State Department

            <titlePage type="main">
                <pb n="I" xml:id="pg_I" facs="0001"/>
                    <graphic url="figure_0001.tif"/>
                    <titlePart type="series">Foreign Relations of the United States</titlePart>
                    <titlePart type="subseries">1945–1950</titlePart>
                    <titlePart type="volume">Emergence of the Intelligence Establishment</titlePart>
                <docImprint>Department of State<lb/>Washington, DC</docImprint>
                <byline>Editor: <persName>C. Thomas Thorne, Jr.</persName>
                    <persName>David S. Patterson</persName>
                <byline>General Editor: <persName>Glenn W. LaFantasie</persName>
                    <publisher>United States Government Printing Office</publisher>
                    <pb n="II" xml:id="pg_II" facs="0002"/>
                    <docDate>1996</docDate>DEPARTMENT OF STATE PUBLICATION 10316<lb/>OFFICE OF THE
                    HISTORIAN<lb/>BUREAU OF PUBLIC AFFAIRS<lb/>For sale by the U.S. Government
                    Printing Office Superintendent of Documents, Mail Stop: SSOP, Washington, DC
                    20402-9328<lb/>ISBN 0-16-045208-2</docImprint>
            <pb n="III" xml:id="pg_III" facs="0003"/>
            <div type="section" xml:id="preface">
                <p>The <hi rend="italic">Foreign Relations</hi> of the United States series presents
                    the official documentary historical record of major foreign policy decisions and
                    significant diplomatic activity of the United States Government. The series
                    documents the facts and events that contributed to the formulation of policies
                    and includes evidence of supporting and alternative views to the policy
                    positions ultimately adopted.</p>            

Well-formed XML Documents

Naming in XML

Purpose: to differentiate your use of names from mine.

Is address a street location or a computer IP address?

Syntax of Namespaces

Why didn't XML take over the world?

Answer #1: It did. You just don't see it.

Answer #2: It isn't an intrinsic vocabulary. It's a syntax.

Answer #3: Web browsers don't do the same thing with XML documents as they do with HTML.

Impoverished Processing Model

When the browser loads an XML document:

  • scripts do not run,
  • links are not identified,
  • embedded objects to not load.

The result is applications do not work well.

This can be fixed:

  • there are standards,
  • but it is a little late,
  • and browser vendors aren't interested.