RSS and ATOM
R. Alexander Milowski
milowski at sims.berkeley.edu
#1
News Syndication
Both RSS and ATOM can be viewed as "news syndication".
Each describes a vocabulary for describing articles and resources on the web.
A service provider or web site can provide these feeds to users.
As the feed is an XML vocabulary it can be manipulated by users in many ways:
Aggregation of topics.
Filtering by keyword, site, author, etc.
Displayed as website content, sidebars, etc.
#2
History of RSS
RSS: Really Simple Syndication
Started in 1997 as "scriptingNews" at UserLand.
RSS 0.90 developed at Netscape in 1999.
RSS 0.91 adopts most of the features from scriptingNews in 1999 and UserLand switches to RSS.
RSS 1.0 is published in 2000 by a "private group" and is based on RDF. It is completely different and unrelated to RSS 0.91 as it uses RDF.
RSS 0.92 is published in December of 2000.
RSS 2.0 is published through Harvard in 2003 under the Creative Commons license.
RSS 2.0 is basically a formalization of version 0.92
#3
Current Status and Use of RSS
The current specification is RSS 2.0 available at the Berkman Center for Internet & Society at Harvard Law School.
Places you'll see RSS:
Many tech websites: java.net, xml.com, etc.
News websites: CNN.com, sfgate.com, etc.
Many techies have their own RSS feeds.
RSS is easy. It is just an XML document on you website! ...but you can make it more complicated if you wish.
#4
The RSS Concept
An RSS feed consists of a set of channels.
A channel has metadata and a set of items.
Each item has metadata about an article or web resource.
Each item can contain a description (which may be the whole thing).
#5
The Basic RSS Structure
There is no namespace name for all RSS elements.
The document element is 'rss' and contains a set of 'channel' elements:
<rss version="2.0"> <channel> ... </channel> <channel> ... </channel> </rss>
Every channel element must have:
A 'title' element:
<title>Center for Document Engineering</title>
A 'link' element:
<link>http://cde.berkeley.edu</link>
A 'description' element:
<description>The Center for Document Engineering is a cool place.</description>
Usually these elements precede the 'item' elements but order is not specified.
There are many other optional elements that can be added to a channel.
#6
An RSS Item
An item is a description of a web resource.
It often contains a short summary of the resource.
All the elements of item are optional but it must contain at least one title or description.
Example:
<item> <title>Document Engineering: Information Systems 243</title> <link>http://cde.berkeley.edu/events/s05is243/</link> <description> This course introduces a new discipline of document engineering for specifying, designing, and deploying the electronic documents that enable document-centric business transactions and applications, including web services and virtual enterprises. Topics include developing requirements, analyzing existing documents, identifying reusable components, modeling business processes, representing models using XML schemas, and using XML models to implement and drive applications. </description> </item>
An item can have the optional elements:
Metadata: title, link,description, author, pubDate, guid, category
Additional Resources: comments, enclosure
#7
HTML in Descriptions
Any description element is allowed to have HTML embedded in it.
It isn't XHTML embedded as children.
It is entity-encoded HTML:
<description><p>I'm a paragraph but you don't know it.</p></description>
Or even worse:
<description>How many ampersands do you see: &amp;amp; ?</description>
This makes it difficult but not impossible to process with XML tools.
#8
Namespaces and Extensibility in RSS
All elements in RSS have no namespace associated with them.
Any element in a namespace is considered an extension.
For example, you could add a Dublin core metadata element:
<dc:creator>Alex Milowski</dc:creator>
#9
The History of ATOM
There's a good description of where ATOM came from at: http://www.intertwingly.net/wiki/pie/Motivation
In many ways, ATOM is designed as a "next generation" RSS.
But ATOM also has an API for manipulating it.
ATOM is more than just a format for syndication.
#10
The ATOM Concept
ATOM elements are all in the http://purl.org/atom/ns# namespace.
An ATOM feed consists of metadata and set of entries.
Each entry has metadata about an article or web resource.
Each entry can contain a description (which may be the whole thing).
Sound familiar?
#11
Embedding XHTML in ATOM
ATOM allows XHTML to be embedded where "content" is allowed.
Any place a 'type' attribute is allow you may have the following values:
text - text only content:
<atom:title type="text">I'm text</atom:title>
html - escaped HTML:
<atom:title type="html"><b>I'm text</b></atom:title>
xhtml - a single XHTML div element is allowed. It must be in the XHTML namespace:
<atom:title type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <p>A Title</p> <p>A Subtitle</p> </div> </atom:title>
Any other element without the type attribute that has text content is assumed to be of type 'text'.
You can use any namespace/prefix scheme you want as long as the 'div' element is in the XHTML namespace.
#12
Common Attributes
ATOM allows the xml:lang and xml:base attributes on any element.
xml:base might be very useful when creating a collection of entry links to a single website.
xml:lang allows you to designate the feed or entry's language.
#13
The Basic Structure
An ATOM feed starts with the [atom:]feed element.
The [atom:]title and [atom:]updated element are required.
Elements can appear in any order except:
Elements of the same type must be grouped in one sequence.
The [atom:]entry elements must occur last.
An example:
<feed xmlns="http://purl.org/atom/ns#draft-ietf-atompub-format-07"> <title>Example Feed</title> <link href="http://example.org/"/> <updated>2003-12-13T18:30:02Z</updated> <author> <name>John Doe</name> </author> <entry> <title>Atom-Powered Robots Run Amok</title> <link href="http://example.org/2003/12/13/atom03"/> <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id> <updated>2003-12-13T18:30:02Z</updated> <content>Some text.</content> </entry> </feed>
#14
Person Construct
The ATOM spec has the concept of a "construct" which is much like a complex type in XML Schema.
Here is the "Person Construct" translated into XML Schema:
<xs:complexType name="PersonConstruct"> <xs:all> <xs:element ref="atom:name"/> <xs:element ref="atom:uri" minOccurs="0"/> <xs:element ref="atom:email" minOccurs="0"/> <xs:any ref="minOccurs="0" maxOccurs="unbounded" namespace="##other"/> </xs:all> <xs:attributeGroup ref="atom:common"/> </xs:complexType>
An example:
<atom:author> <atom:email>alex@milowski.com</atom:email> <atom:name>Alex Milowski</atom:name> <my:dog>Hudson</my:dog> </atom:author>
#15
Entries
An [atom:]entry element must have a [atom:]title and [atom:]updated child.
Order doesn't matter except that children of the same type must be grouped together.
You can put the whole content of the entry inside a "content" element:
<atom:entry> <atom:title>ATOM is Cool</atom:title> <atom:update>2005-04-13T16:11:00-7:00</atom:updated> <atom:content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <p>RSS is nice but ATOM is cool because I can properly embed XHTML. That means I don't need any funky business to process this text.</p> </div> </atom:content> </atom:entry>
#16
There is more for ATOM and RSS.
You need to dig into feeds and the specifications to figure out:
Which elements you want to use to encode your entries.
Which elements you want to process.
What they mean and how many you can have.
I've given you links to the specifications in this presentation.
They are actually not hard to read!