HTML and Intrinsic Vocabularies

R. Alexander Miłowski

School of Information, UC Berkeley

What is an Intrinsic Vocabulary?

In the context of markup:

A vocabulary is a specific set of elements and attributes (e.g., HTML, SVG, etc.) along with their various rules for their use.

From Merriam-Webster:

intrinsic (adjective): belonging to the essential nature of a thing : occurring as a natural part of something

In the context of the Web:

An intrinsic vocabulary is one that is provided or expected by all implementations of the Open Web Platform (O

The Intrinsic Vocabularies of the OWP

HTML5 establishes these three vocabularies as intrinsic.

Markup - pointy brackets - a midlife crisis?

From markup's assorted beginnings:

it is a 45 year old idea.

We still struggle with:

  • identity
  • annotation
  • mixing
  • semantics

and that's just naming a few things...

But it is about as good as it gets...

URIs, Resources, and Representations

  • We name resources with URIs.
  • User agents retrieve representations (HTML) via the URI.
  • User agents render that representation for use by the user.

HTML (hypertext) is the interface to the Web by which we view, navigate, and manipulate information.

HTML is markup with a long heritage...

What really happens ...

We expect more!

HTML markup - 1

<p>This is a paragraph.</p> ⇐ an element

<img src=""> ⇐ an empty element (no end tag)
      ⇑ an attribute with a quoted literal value

<li>Item One</li>⇐ nested element
<li>Item Two</li>

HTML markup - 2

Basic document structure

<!DOCTYPE html> ⇐ This is the new HTML5 prolog.
<html lang="en">
<title>My First HTML Document</title> ⇐ Lots of things happen here but they don't render!
   <p>So exciting!</p> ⇐ Things you see go here.

The Document Object Model (DOM)

The browser builds a tree of nodes.

<h1>Document Object Model</h1>
<p>This is my first paragraph</p>
<p>My second paragraph has a list:
<li>Item One</li>
<li>Item Two</li>
<li>Item Three</li>
<p>This is the third paragraph</p>

id attributes

<div id="important">something important <div>

Allows elements to be:            

What's a Fragment Identifier?

The fragment is interpreted by the client and never sent to the server.

  ⇓ protocol 
            ⇑ domain      ⇑ path       ⇑ fragment

A fragment identifier is simple string of characters typically associated with the identity of a particular location in the document format.

An Inventory of Common Elements

article p a img
section div em iframe
header pre cite audio
footer blockquote q video
nav ol dfn svg
aside ul code math
address figure span
table main sub/sup

MathML and SVG are there as part of HTML5!

Scalable Vector Graphics (SVG)

As SVG is intrinsic, you can just embed images.
<h3>A smiley</h3>
<svg version="1.1" 
height="400" ...>

A smiley

MathML is a bit harder

MathML is not consistently implemented across the various browsers.

a 0 + 1 a 1 + 1 a 2 + 1 a 3 + 1 a 4 ( 2 x 2 + 2 y 2 ) | φ ( x + i y ) | 2 = 0

MathJAX is one way to fix this.


HTML5 has two serializations as HTML and XHTML syntax.


I prefer XHTML because I typically use an XML tool chain.