-
Delicious Memex
Introduced September 1 – Due September 22
Technologies
Javascript, jQuery, Greasemonkey, JSON, Delicious API
Description
During the first class students saw and later implemented a standalone version that allows users to load bookmarks from any Delicious account and create a new trail. For this assignment, students will create a related implementation that explores the idea of trails as a mechanism for organizing information using Delicious.
Options
Select one of the following as a starting point:
-
Improvements to Standalone Trail Maker: Building upon the allowing people to create trails from more than one user's bookmarks, search, go through tags.
-
Trail Browser: Create an interface that displays all of a Delicious user's trails and lets you navigate through them. The navigation interface could be purely textual and display metadata from Delicious. It could also display bookmarked pages themselves within an iframe or by using a screencapture utility like webkit2png or khtml2png.
-
Post to Trail: Use Greasemonkey to modify the Delicious Post a Bookmark screen so that a user can add a bookmark to an existing trail or create a new one.
-
Trail Maker at Delicious: Use Greasemonkey to modify the Delicious bookmarks page to allow users to create a trail from their bookmarks within the current Delicious interface.
Notes
The syntax for trails that we will be using on Delicious is to apply 'trail:[trail_name]' to an item to indicate that it is part of a trail and 'step:[number]' to indicate where in the trail the item is positioned. Each item in a trail named "Vacation in Paris," for example, has a tag 'trail:vacation_in_paris'. The first item in the trail has a tag 'step:1', the second item has the tag 'step:2', and so on.
-
Controlled Vocabularies
Introduced September 22 – Due October 6
Technologies
Javascript, jQuery, Greasemonkey, JSON, site-specific APIs
Description
In Cory Doctorow's "Metacrap" essay he lists seven problems with explicit metadata. Students will build or modify UI as a way to potentially address one of these problems, making it easier to use a controlled and consistent vocabulary.
Students may also run experiments or analyze existing data to document how much of a problem uncontrolled vocabularies are and how much of a difference a simple fix might make.
Options
Select one of the following as a starting point:
-
People are lazy: Attempt to prove Doctorow wrong and show that ease-of-use will help this problem. Add UI onto Delicious (or some other service) that helpfully suggests tags to make it easy for you to follow the strict tagging principles you defined in 202 Assignment 6 last year or the vocabulary you designed in Assignment 3 this year. Or, investigate automatic tagging using the mSense API, the Times Topics API or some other approach.
-
People are stupid: There still exist lots of "Plam Treo" listings on eBay. Add UI onto eBay (or some other service) that auto-corrects spelling mistakes. Or, build a UI that suggests similar spellings that are more popular.
-
Canonical: Create a metric for the dilution experienced when several non-canonical versions of a link are saved on a service like Delicious. What would the effect be if these links were all consolidated? Or, build a extension for Delicious that automatically inserts the canonical version of a URL.
-
Or: tackle another one of Doctorow's strawmen.
-
Semantic Web and Microformats
Introduced October 6 – Due October 27
Technologies
Python, Google App Engine, RDF, RDFa, FOAF, XFN
Description
The semantic web promises to define content precisely and meaningfully enough that computer agents will be able to make sense of it. Some propose that RDF and SPARQL are the correct way to realize this dream, while others argue that lighter-weight microformats are more practical. Students will build tools to either produce or consume either RDF triples or web-based microformats.
Options
Select one of the following as a starting point:
-
Build a triple-store as described in Programming the Semantic Web on top of Google App Engine, enter some triple data and write code to implement some interesting query (like six degrees of Kevin Bacon) as a web app. (Or modify RDFlib to use the Google App Engine datastore for persistent storage.) Describe the techniques for, and advantages and disadvantages of using Google App Engine's datastore for RDF storage.
-
Build an interface to let iSchool users easily create their own FOAF files of iSchool contacts. Or export iSchool users' data from Facebook into FOAF or an RDF store.
-
Using the FOAF or XFN connections of iSchool faculty, staff and students, create an application which recommends new iSchool friends or a visualization of the existing social graph using its RDF/FOAF representation. Or, create a SPARQL query to calculate centrality or degree of various iSchoolers (ask Granovetter why this might be helpful).
-
Link two or more semantic data sources together to answer some question you think is interesting. (Programming the Semantic Web has some good examples to start with, like calculating degrees of Kevin Bacon. Try using metaweb.py to access Freebase.)
-
Design a set of semantic web formats to use on the iSchool website (or some other web resource that you can edit). Choose the set of RDFa attributes that should be applied, using existing namespaces (Dublin Core, etc.) wherever possible or suggest hCard, hCal, XFN or some other microformat. For example, add semantic content so that a computer can understand class schedules or the times and locations of lectures, or add Dublin Core author information to links about papers and books written by iSchool faculty. Mark up at least a few sample pages so that some other group can build a tool to consume that information.
-
Build a tool to consume RDFa or microformatted content from iSchool pages (or, if more ambitious, arbitrary pages on the web -- we could help you use 80legs to crawl some substantial portion of the web) and either visualize the data (a graph of who co-wrote a paper with which other faculty member) or draw some programmatic conclusions from it (email alerts to the dean when a lecture overlaps with a career fair).
-
Or, build your own tool to either add or consume semantic content, whether it's in RDF or microformat form.
Notes
Our campus O'Reilly Safari subscription gives us all unlimited access to Programming the Semantic Web, a very valuable resource for actually writing code using RDF, FOAF and other semantic web resources.
-
Social and Distributed Classification
Introduced October 27 – Due November 10
Technologies
Python, Google App Engine, Greasemonkey, Javascript, jQuery
Description
Analyze existing uses of social classification and attempt to evaluate their usefulness.
Options
Select one of the following as a starting point:
-
Build a tool to visualize a user's Delicious or Flickr tags. Create a graph of how frequently tags are used to check for a long tail effect. Let users compare their tag distributions to other students in the class or to their friends. What calculations would you like to provide students in 202 who are working on Assignment 6?
-
For some prominent blogger that you like who classifies their own work, build a tool to analyze how their own author-chosen tags compare to the tags for that link on Delicious (or wherever).
-
Wikipedia's organization of categories and disambiguation is itself a socially determined classification and may prove very valuable. Build a Greasemonkey script that uses Wikipedia disambiguation pages to suggest narrower, less ambiguous search queries. How does your Wikipedia-generated disambiguation compare to Google or Bing's suggested searches? There is a (long) list of Wikipedia disambiguation pages that your project can draw from. You don't necessarily have to extract all the disambiguation pages from Wikipedia to demonstrate this idea.
-
Search and Retrieval
Introduced November 10 – Due December 10
Technologies
Python, Google App Engine, Greasemonkey, Javascript, jQuery, 80legs
Description
Boolean search, tf-idf, stemming, stop words, and natural language processing are all techniques used to improve access to information on the retrieval end. In this project students will create a tool that illustrates the effects of various approaches to search.
Options
Select one of the following as a starting point:
-
Write a script in Python that automates the work done in assignment 7 for Info 202, "Term Weighting and Ranking Calculations." The tool you write should provide a way to visualize the effects that changes in term frequency in a given document or throughout the corpus have on searches.
-
Tweet Search. Using a provided framework for Google App Engine or Javascript, write some search methods to compare the effectiveness of different search queries side-by-side when searching a user’s messages on Twitter. You could use a combination of simple boolean search, a stemming algorithm, td-idf, or a NLP-based method to provide different results. Evaluate what, if any, benefit a vector algorithm has in displaying relevant results for a corpus with very short documents. Implementations of the Porter stemmer or Porter2 stemmer are available in various languages.
-
Build the worst possible corpus, a modern-day Library of Babel that contains real readable documents and many, many randomized versions of real documents. Can search engines (Google, a search engine that some other group builds) find the real information amongst the noise? You could use Wikipedia (or Twitter) as a starting corpus.