Syndication Formats

Outline (Syndication Formats)

Syndication Formats [18]
1. RSS [11]
2. Atom [7]
Syndication Aggregation [5]
1. FeedBurner [3]
[9]
Conclusions [1]

RSS

Outline (RSS)

Syndication Formats [18]
1. RSS [11]
2. Atom [7]
Syndication Aggregation [5]
1. FeedBurner [3]
[9]
Conclusions [1]

RSS D. Mahendran: Content Syndication

(6) RSS History

The Myth of RSS Compatibility [http://diveintomark.org/archives/2004/02/04/incompatible-rss] provides a good overview
RSS is a text book example for why standards are a good thing
- RSS 0.9 [RSS 0.9 (1)] was created for the My Netscape portal in March 1999
- RSS 0.91 (a simplification) was introduced in July 1999 (as an interim solution)
- the AOL/Netscape merger removed the format from the company's portal
- RSS was without an owner, and different parties claimed/denied ownership
- RSS 1.0 [RSS 1.0 (1)] was created by an informal developer group
- RSS 0.92 (and 0.93 and 0.94) were published without acknowledging RSS 1.0
- finally, RSS 2.0 [RSS 2.0 (1)] was released as a follow-up to the RSS 0.9x versions
Using RSS has become an exercise in managing a menagerie of versions

RSS D. Mahendran: Content Syndication

(7) RSS 0.9

RSS means RDF Site Summary (or Rich Site Summary?)
- based on an RDF draft and not compatible with the final RDF specification
- RDF was considered too cumbersome and unstable
- 0.90 (proto-RDF) was quickly replaced by the non-RDF 0.91 version
RSS 0.92+ versions were developed as unilateral specifications
- starting with RSS 0.91, RSS means Rich Site Summary
- it is no longer built on RDF, instead it simply uses XML
- the 0.9x branch eventually was renamed to RSS 2.0 [RSS 2.0 (1)]

RSS D. Mahendran: Content Syndication

(8) RSS 0.91 Example

<rss version="0.91">
 <channel>
  <title>XML.com</title>
  <link>http://www.xml.com/</link>
  <description>XML.com features a rich mix of information and services for the XML community.</description>
  <language>en-us</language>
  <item>
   <title>Normalizing XML, Part 2</title>
   <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
   <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
  </item>

rss091.xml (line 2-12)

RSS D. Mahendran: Content Syndication

(9) RSS 1.0

RSS means RDF Site Summary (this time for real)
- based on the final RDF specification and thus incompatible with any RSS 0.9 [RSS 0.9 (1)]
- developed when the Semantic Web [Information About the Web] and [@rdf] were first heavily marketed (1999 [http://dret.net/biblio/reference/lee99])
- RDF was expected to become the format for metadata on the Web
RSS 1.0 makes heavy use of XML Namespaces
RSS 1.0 introduces features which were not present in 0.91
- date information for published items (very relevant for news feeds)
- individual authors for various items in a feed
RSS 1.0 is the latest version of RDF-based RSS
- the Semantic Web [Information About the Web] wave is not over yet, but [@rdf] has lost its novelty appeal
- for a more XML-oriented encoding, RSS 0.9 [RSS 0.9 (1)] provides a better foundation

RSS D. Mahendran: Content Syndication

(10) RSS 1.0 Example

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">
 <channel rdf:about="http://www.xml.com/cs/xml/query/q/19">
  <title>XML.com</title>
  <link>http://www.xml.com/</link>
  <description>XML.com features a rich mix of information and services for the XML community.</description>
  <language>en-us</language>
  <items>
   <rdf:Seq>
    <rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/normalizing.html"/>
    <rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/som.html"/>
    <rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/svg.html"/>
   </rdf:Seq>
  </items>
 </channel>
 <item rdf:about="http://www.xml.com/pub/a/2002/12/04/normalizing.html">
  <title>Normalizing XML, Part 2</title>
  <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
  <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
  <dc:creator>Will Provost</dc:creator>
  <dc:date>2002-12-04</dc:date>
 </item>

rss10.xml (line 2-22)

RSS D. Mahendran: Content Syndication

(11) RSS 2.0

RSS now means Really Simple Syndication
- RSS 2.0 is the continuation of the 0.91 branch (which dropped RDF)
- together with RSS 1.0 [RSS 1.0 (1)] it is the most popular version of RSS
- migration from 0.91 to 2.0 is easily possible
RSS 2.0 tries to avoid the use of XML Namespaces
RSS 2.0 is increasingly used with extensions [http://rss-extensions.org/wiki/Main_Page] for vendor-specific information
- the RSS core is minimal, so many applications need extensions
- many extensions have overlapping functionality
- most extensions have unclear semantics and unclear versioning policies

RSS D. Mahendran: Content Syndication

(12) RSS 2.0 Example

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
 <channel>
  <title>XML.com</title>
  <link>http://www.xml.com/</link>
  <description>XML.com features a rich mix of information and services for the XML community.</description>
  <language>en-us</language>
  <item>
   <title>Normalizing XML, Part 2</title>
   <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
   <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
   <dc:creator>Will Provost</dc:creator>
   <dc:date>2002-12-04</dc:date>
  </item>

rss20.xml (line 2-14)

RSS D. Mahendran: Content Syndication

(13) The Case for Content Management

RSS is very rarely produced by hand
- by definition, RSS contains redundant information for a specific purpose
If a Content Management System (CMS) is used, RSS can be generated
- basic metadata can be generated by the CMS (title, author, date)
- better tagging of content results in better tagging of feeds
- well-tagged feeds are better foundations for large-scale reuse of feed items
Blogging is simply a specialized case of a CMS
- Web-based interface for controlling everything
- strictly time-ordered sequenced of published items
- navigation features primarily based on the time-specific facets of the blog (maybe tags)
- all blogging tools include feed support

RSS D. Mahendran: Content Syndication

(14) Consuming RSS

RSS feeds often have quality problems
- surprisingly often feeds do not even deliver well-formed XML
- the use of embedded markup in RSS is not well-defined
Writing an RSS reader from scratch is not a good idea
There are three major tasks which RSS readers must do
1. accept non-XML RSS feeds and fix them to be XML
2. look at the feed contents and bring them into a unified form
3. produce a unified view of feeds regardless of the RSS version

RSS D. Mahendran: Content Syndication

(15) RSS Technical Problems

What to put into an item's description

the fundamental question is whether a description is text or HTML

if there is no well-defined way, then interpretation is client-specific

<description>This is a <em>very important</em> blog post …

<description>This is a &lt;em>very important&lt;/em> blog post …

<description>This is a blog post about <em> in RSS feeds …

<description>This is a blog post about &lt;em> in RSS feeds …

<description>This is a blog post about &amp;lt;em> in RSS feeds …

Underspecified and not very robust in various other areas
- broken RSS is accepted by most readers (but fixing it can change the interpretation)
- the interpretation of relative URIs is not mentioned in the specifications
- some minimal semantics (classification) for items would be very useful

RSS D. Mahendran: Content Syndication

(16) RSS Political Problems

Multiple and incompatible RSS History [RSS History (1)] are still in widespread use
- RSS 1.0 [RSS 1.0 (1)] and RSS 2.0 [RSS 2.0 (1)] are incompatible by design (RDF vs. non-RDF)
- none of the RSS versions is maintained by a universally accepted standards body
None of the specifications is being updated or fixed
- some of the lessons learned by RSS deployment are not used in a new version
- it is unlikely that a new version will be produced which merges the RSS landscape
Invent something new instead of trying to fix RSS
- Atom [Atom (1)] started in 2003 (called Echo at first)
- W3C or IETF would have been promising candidates for a new RSS
- W3C is more formal, IETF is more developer-centered
- IETF was chosen over W3C [http://www.bestkungfu.com/?p=492] because the of Atom community's preferences

Atom

Outline (Atom)

Syndication Formats [18]
1. RSS [11]
2. Atom [7]
Syndication Aggregation [5]
1. FeedBurner [3]
[9]
Conclusions [1]

Atom D. Mahendran: Content Syndication

(18) Atom History

RSS's shortcomings were very apparent and could not be fixed
In mid-2003, discussions started about an improved format
It also became apparent that the format should have a protocol
Atom 0.3 was released in December 2003 but had no formal home
IETF was chosen as the new home with a working group in June 2004
RFC 4287 [http://tools.ietf.org/html/rfc4287] was published in December 2005
[@atompub] has been published as RFC 5023 [http://tools.ietf.org/html/rfc5023] in October 2007

Atom D. Mahendran: Content Syndication

(19) Atom vs. RSS

Standardized by the IETF (well-defined process)
Classification of entries (user-defined categories)
More XML-like markup design (more nesting)
Namespaces are used and supported as standard mechanism
Atom feeds must be well-formed XML (there even is a schema [http://atompub.org/2005/08/17/atom.rnc])
Interpretation of content is well-defined (various content types)
Support for xml:lang and xml:base

Atom D. Mahendran: Content Syndication

(20) Atom Example

<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-us">
 <title>ongoing</title>
 <id>http://www.tbray.org/ongoing/</id>
 <link rel='self' href="http://www.tbray.org/ongoing/ongoing.atom"/>
 <updated>2007-04-11T12:55:09-07:00</updated>
 <author>
  <name>Tim Bray</name>
 </author>
 <subtitle>ongoing fragmented essay by Tim Bray</subtitle>
 <entry xml:base="When/200x/2007/04/02/">
  <title>Atom Publishing Protocol Interop!</title>
  <id>http://www.tbray.org/ongoing/When/200x/2007/04/02/APP-Interop</id>
  <published>2007-04-02T13:00:00-07:00</published>
  <updated>2007-04-10T14:24:00-07:00</updated>
  <category scheme="http://www.tbray.org/ongoing/What/" term="Technology/Atom"/>
  <category scheme="http://www.tbray.org/ongoing/What/" term="Technology"/>
  <category scheme="http://www.tbray.org/ongoing/What/" term="Atom"/>
  <content type="xhtml">
   <div xmlns="http://www.w3.org/1999/xhtml">
    <p>Mark your calendar: <a href="http://www.intertwingly.net/wiki/pie/April2007Interop">April 16-17 at Google</a>. <em>Everybody</em> is invited, provided they bring along an APP implementation, client or server. This was just announced a couple of days ago, and as I write this there are already <s>six</s> twelve client and <s>seven</s> fourteen server implementations signed up to be there and try to <a href="http://www.intertwingly.net/wiki/pie/InteropGrid">fill in the grid</a>. Let’s drop some names, in alphabetical order: AOL, Flock, Google, IBM, Lotus, Microsoft, Oracle, O’Reilly, Six Apart, Sun, WordPress. Um, have I mentioned that the APP is going to be huge?</p>
   </div>
  </content>
 </entry>
</feed>

atom.xml

Atom D. Mahendran: Content Syndication

(21) Atom Content

RSS had no safe way of finding out what an entry's content is
- this led to different implementations being smart about what the RSS author really wanted
- one of Atom's main goals was to improve this in a well-defined way
- Atom allows escaped markup (the only way to include non-XML HTML in an XML format)
Each content element should have a type (the default is text)
Atom's content interpretation algorithm (use first applicable rule):
1. if type is text, no child elements are allowed (plain text content)
2. if type is html then RSS's method of escaped markup is used
3. if type is xhtml then there must be an div containing XHTML markup
4. if type is an XML media type [Media Types [rescheduled lecture]] then the content should be treated as this type
5. if type starts with text/ then no child elements are allowed
6. for all other values, the content must be an base64-encoded entity of the specified MIME type

Atom D. Mahendran: Content Syndication

(22) Atom Content Examples

<content type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
	One <strong>bold</strong> foot forward
  </div>
</content>

[http://www.xml.com/lpt/a/1633]

<content>The "atom:content" element either contains or links to the content of the entry. The content of atom:content is Language-Sensitive.</content>

[http://www.xml.com/lpt/a/1633]

<content type="html">The &lt;code>atom:content&lt;/code> element either contains or links to the content of the entry. The content of &lt;code>atom:content&lt;/code> is &lt;a href="http://www.ietf.org/rfc/rfc3066.txt">Language-Sensitive&lt;/a>.</content>

[http://www.xml.com/lpt/a/1633]

<content type="image/png">
iVBORw0KGgoA … TAAAAAElFTkSuQmCC
</content>

[http://www.xml.com/lpt/a/1633]

<content src="image.png" type="image/png"/>

[http://www.xml.com/lpt/a/1633]

Atom D. Mahendran: Content Syndication

(23) Atom Categories

Atom allows to assign categories to entries
- each category element must have a term attribute for the category
- an optional scheme identifies the categorization scheme (ontology, taxonomy, …)
- an optional label attribute provides a human-readable label for the category
[@atompub] defines a document format for Category Documents [Category Documents (1)]
Three different cases of categorization can be distinguished
1. use a well-known scheme (such as Dublin Core)
2. use a private but well-designed scheme (which has a URI and can be reused reliably)
3. use tags without schemes, which then are little more than content labels
Widely-known tags are not easy to handle [http://www.tbray.org/ongoing/When/200x/2007/02/01/Tag-Scheme]
- they are more than just privately assigned tags
- there is no formal scheme for them, just an emerging consensus

Atom D. Mahendran: Content Syndication

(24) Switching from RSS to Atom

Generate both feeds but serve RSS with an HTTP redirect (301)
- old subscribers with broken clients can still use the RSS feed
- old subscribers with correct clients will use the Atom feed
Atom exposes more information than RSS (category for tags)
- the mapping of publishing info to the feed has to be changed/extended
- for standard metadata use Atom's built-in metadata elements
- for application-specific metadata consider reusing an existing metadata schema
Atom can be used to publish snippets as well as full content
- content allows any type of content to be used and may contain a complete entry
- summary allows only text and should provide a condensed version of an entry
- some Atom sources publish two feeds for summaries and content
Generate good Atom and downgrade it to RSS 1.0 & 2.0

Outline (Content Syndication)

Syndication Formats [18]
1. RSS [11]
2. Atom [7]
Syndication Aggregation [5]
1. FeedBurner [3]
[9]
Conclusions [1]

Content Syndication D. Mahendran: Content Syndication

(33) Syndication Format Protocols

LiveJournal (very simple text-based protocol)
- not very good at handling structures (re-inventing for encoding structure)
Blogger (now at Google [http://code.google.com/apis/blogger/overview.html] after Google bought Pyra [http://web.archive.org/web/20031008161432/http://weblog.siliconvalley.com/column/dangillmor/archives/000802.shtml])
- no support for titles or any other sort of entry metadata
- protocol from the early days of blogging before tagging became popular
MetaWeblog [http://www.xmlrpc.com/metaWeblogApi] (an attempt to improve Blogger)
- extends Blogger using a very bad design (RSS XML as XML-RPC structure encoded as XML)
Atom Publishing Protocol (AtomPub) is an attempt to provide a clean alternative
- use the same document structures for feeds and the protocol interacting with them
- use a REST approach to provide a simple and Web-compatible protocol
- add Service Documents [Service Documents (1)] and Category Documents [Category Documents (1)] for additional tasks

Content Syndication D. Mahendran: Content Syndication

(34) RESTified Syndication

Atom is a format for retrieving a set of entries as a feed document
- feeds often are time-based and are refreshed periodically or whenever needed
- feeds can use any other strategy for deciding what to publish
Read-only access to feeds should be complemented by full access
- full access needs the CUD out of the CRUD set of operations
- many Web-centric technologies try to build on the Web's REST model of interaction
AtomPub builds on Atom and adds a REST-based protocol on top of it
- POST for creating new entries (sending the request to the collection)
- PUT for updating existing entries (overwriting the existing entry)
- DELETE for deleting entries from a collection

Content Syndication D. Mahendran: Content Syndication

(35) Collections, Members, Entries, Media

AtomPub's top-level concept is a collection
- collections are used for managing and organizing members
- Atom feed documents are the representation of collections
Members of a collection can be entry and media resources
- entry resources represent metadata and are represented as Atom entries
- media resources can have any media type and are the data described by entries
- a media link entry is an entry associated with a member

Content Syndication D. Mahendran: Content Syndication

(36) Protocol Summary

Resource	HTTP Method	Representation	Description
Introspection	GET	Atom Service Document [Service Documents (1)]	Enumerates a set of collections and lists their URIs and other information about the collections
Collection	GET	Atom Feed	A list of member of the collection (this may be a subset of all entries in the collection)
Collection	POST	Atom Entry	Create a new entry in the collection
Member	GET	Atom Entry	Get the Atom Entry
Member	PUT	Atom Entry	Update the Atom Entry
Member	DELETE	n/a	Delete the Atom Entry from the collection

Content Syndication D. Mahendran: Content Syndication

(37) Service Documents

Service Documents represent server-defined groups of Collections, and are used to initialize the process of creating and editing resources.

The real top-level construct of AtomPub is the workspace
- collections on a server are organized into different workspaces
- workspaces have no AtomPub semantics and no operations can be performed on them
Service documents list constraints on the members of collections
- accept specifies a comma-separated list of media ranges (with entry as special value)
- categories defines the list of categories that can be applied to members (can be fixed)
- AtomPub servers are likely to reject operations not satisfying these constraints

Content Syndication D. Mahendran: Content Syndication

(38) Service Document Example

<service xmlns="http://purl.org/atom/app#" xmlns:atom="http://www.w3.org/2005/Atom">
 <workspace>
  <atom:title>Main Site</atom:title>
  <collection href="http://example.org/reilly/main">
   <atom:title>My Blog Entries</atom:title>
   <categories href="http://example.com/cats/forMain.cats"/>
  </collection>
  <collection href="http://example.org/reilly/pic">
   <atom:title>Pictures</atom:title>
   <accept>image/*</accept>
  </collection>
 </workspace>
 <workspace>
  <atom:title>Side Bar Blog</atom:title>
  <collection href="http://example.org/reilly/list">
   <atom:title>Remaindered Links</atom:title>
   <accept>entry</accept>
   <categories fixed="yes">
    <atom:category scheme="http://example.org/extra-cats/" term="joke"/>
    <atom:category scheme="http://example.org/extra-cats/" term="serious"/>
   </categories>
  </collection>
 </workspace>
</service>

atom-service.xml

Content Syndication D. Mahendran: Content Syndication

(39) Category Documents

Categories are important for creating and reading entries
- they may contain metadata using any classification scheme
Service Documents [Service Documents (1)] contain a list of allowed categories
AtomPub defines a document format for standalone category documents
- a useful interface between AtomPub systems and other systems using classification schemes

Content Syndication D. Mahendran: Content Syndication

(40) Category Document Example

<app:categories xmlns:app="http://purl.org/atom/app#" xmlns="http://www.w3.org/2005/Atom" fixed="yes" scheme="http://example.com/cats/big3">
 <category term="animal"/>
 <category term="vegetable"/>
 <category term="mineral"/>
</app:categories>

atom-category.xml

Content Syndication D. Mahendran: Content Syndication

(41) Semantic Web Light

Syndication creates representations for universal concepts
Atom adds some concepts to RSS's model
Syndication revolves around the idea of interacting with items
Atom-based interaction is one way of implementing REST
For more semantics, Atom is only the foundation

Content Syndication

Web Architecture [./]Fall 2011 — INFO 253 (CCN 42598)

Dilan MahendranUC Berkeley School of Information, UC Berkeley School of Information2011-11-08

Contents

(2) Abstract

(3) Content Feeds

Syndication Formats

Outline (Syndication Formats)

RSS

Outline (RSS)

(6) RSS History

(7) RSS 0.9

(8) RSS 0.91 Example

(9) RSS 1.0

(10) RSS 1.0 Example

(11) RSS 2.0

(12) RSS 2.0 Example

(13) The Case for Content Management

(14) Consuming RSS

(15) RSS Technical Problems

(16) RSS Political Problems

Atom

Outline (Atom)

(18) Atom History

(19) Atom vs. RSS

(20) Atom Example

(21) Atom Content

(22) Atom Content Examples

(23) Atom Categories

(24) Switching from RSS to Atom

Syndication Aggregation

Outline (Syndication Aggregation)

(26) End-User Aggregation

(27) Aggregation Intermediaries

FeedBurner

Outline (FeedBurner)

(29) Fixing Feeds

(30) Load Balancing

(31) Statistics/Analytics

Outline (Content Syndication)

(33) Syndication Format Protocols

(34) RESTified Syndication

(35) Collections, Members, Entries, Media

(36) Protocol Summary

(37) Service Documents

(38) Service Document Example

(39) Category Documents

(40) Category Document Example

(41) Semantic Web Light

Conclusions

Outline (Conclusions)

(43) Semantic Web Light

Web Architecture [./]
Fall 2011 — INFO 253 (CCN 42598)

Dilan MahendranUC Berkeley School of Information, UC Berkeley School of Information
2011-11-08