1. Delicious Memex

    Introduced September 1 – Due September 22

    Technologies

    Javascript, jQuery, Greasemonkey, JSON, Delicious API

    Description

    During the first class students saw and later implemented a standalone version that allows users to load bookmarks from any Delicious account and create a new trail. For this assignment, students will create a related implementation that explores the idea of trails as a mechanism for organizing information using Delicious.

    Options

    Select one of the following as a starting point:

    Notes

    The syntax for trails that we will be using on Delicious is to apply 'trail:[trail_name]' to an item to indicate that it is part of a trail and 'step:[number]' to indicate where in the trail the item is positioned. Each item in a trail named "Vacation in Paris," for example, has a tag 'trail:vacation_in_paris'. The first item in the trail has a tag 'step:1', the second item has the tag 'step:2', and so on.

  2. Controlled Vocabularies

    Introduced September 22 – Due October 6

    Technologies

    Javascript, jQuery, Greasemonkey, JSON, site-specific APIs

    Description

    In Cory Doctorow's "Metacrap" essay he lists seven problems with explicit metadata. Students will build or modify UI as a way to potentially address one of these problems, making it easier to use a controlled and consistent vocabulary.

    Students may also run experiments or analyze existing data to document how much of a problem uncontrolled vocabularies are and how much of a difference a simple fix might make.

    Options

    Select one of the following as a starting point:

    • People are lazy: Attempt to prove Doctorow wrong and show that ease-of-use will help this problem. Add UI onto Delicious (or some other service) that helpfully suggests tags to make it easy for you to follow the strict tagging principles you defined in 202 Assignment 6 last year or the vocabulary you designed in Assignment 3 this year. Or, investigate automatic tagging using the mSense API, the Times Topics API or some other approach.
    • People are stupid: There still exist lots of "Plam Treo" listings on eBay. Add UI onto eBay (or some other service) that auto-corrects spelling mistakes. Or, build a UI that suggests similar spellings that are more popular.
    • Canonical: Create a metric for the dilution experienced when several non-canonical versions of a link are saved on a service like Delicious. What would the effect be if these links were all consolidated? Or, build a extension for Delicious that automatically inserts the canonical version of a URL.
    • Or: tackle another one of Doctorow's strawmen.
  3. Semantic Web and Microformats

    Introduced October 6 – Due October 27

    Technologies

    Python, Google App Engine, RDF, RDFa, FOAF, XFN

    Description

    The semantic web promises to define content precisely and meaningfully enough that computer agents will be able to make sense of it. Some propose that RDF and SPARQL are the correct way to realize this dream, while others argue that lighter-weight microformats are more practical. Students will build tools to either produce or consume either RDF triples or web-based microformats.

    Options

    Select one of the following as a starting point:

    • Build a triple-store as described in Programming the Semantic Web on top of Google App Engine, enter some triple data and write code to implement some interesting query (like six degrees of Kevin Bacon) as a web app. (Or modify RDFlib to use the Google App Engine datastore for persistent storage.) Describe the techniques for, and advantages and disadvantages of using Google App Engine's datastore for RDF storage.
    • Build an interface to let iSchool users easily create their own FOAF files of iSchool contacts. Or export iSchool users' data from Facebook into FOAF or an RDF store.
    • Using the FOAF or XFN connections of iSchool faculty, staff and students, create an application which recommends new iSchool friends or a visualization of the existing social graph using its RDF/FOAF representation. Or, create a SPARQL query to calculate centrality or degree of various iSchoolers (ask Granovetter why this might be helpful).
    • Link two or more semantic data sources together to answer some question you think is interesting. (Programming the Semantic Web has some good examples to start with, like calculating degrees of Kevin Bacon. Try using metaweb.py to access Freebase.)
    • Design a set of semantic web formats to use on the iSchool website (or some other web resource that you can edit). Choose the set of RDFa attributes that should be applied, using existing namespaces (Dublin Core, etc.) wherever possible or suggest hCard, hCal, XFN or some other microformat. For example, add semantic content so that a computer can understand class schedules or the times and locations of lectures, or add Dublin Core author information to links about papers and books written by iSchool faculty. Mark up at least a few sample pages so that some other group can build a tool to consume that information.
    • Build a tool to consume RDFa or microformatted content from iSchool pages (or, if more ambitious, arbitrary pages on the web -- we could help you use 80legs to crawl some substantial portion of the web) and either visualize the data (a graph of who co-wrote a paper with which other faculty member) or draw some programmatic conclusions from it (email alerts to the dean when a lecture overlaps with a career fair).
    • Or, build your own tool to either add or consume semantic content, whether it's in RDF or microformat form.

    Notes

    Our campus O'Reilly Safari subscription gives us all unlimited access to Programming the Semantic Web, a very valuable resource for actually writing code using RDF, FOAF and other semantic web resources.

  4. Social and Distributed Classification

    Introduced October 27 – Due November 10

    Technologies

    Python, Google App Engine, Greasemonkey, Javascript, jQuery

    Description

    Analyze existing uses of social classification and attempt to evaluate their usefulness.

    Options

    Select one of the following as a starting point:

  5. Search and Retrieval

    Introduced November 10 – Due December 10

    Technologies

    Python, Google App Engine, Greasemonkey, Javascript, jQuery, 80legs

    Description

    Boolean search, tf-idf, stemming, stop words, and natural language processing are all techniques used to improve access to information on the retrieval end. In this project students will create a tool that illustrates the effects of various approaches to search.

    Options

    Select one of the following as a starting point: