The vocabulary problem in the wild

To illustrate the point that Bob made in class about tagging and controlled vocabularies, you may want to take a look at SociallyDelicious. Prateek, Yo-Shang and I created this tool last year in IOLab. The main idea is to see show people tag bookmarks in del.icio.us and whether their tagging behavior is influences by their level of expertise.

You must provide a url (such as nytimes.com, 511.org or http://www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html). Also, you need to define the ranges for "experienced" or "intermediate" users. By default, we consider the following ranges: Novice (1-100), Intermediate (101-500) and Experienced (501 and up), but this can be changed every time a url is analized.

Delicious can provide us with the last 100 users who bookmarked a url. As we are analyzing the level of expertise of every user, we need to ask delicious for summary information on the users who have tagged a given url. Since it takes a long time to ask delicious for every user, you can enter a lower number. Be warned!

After we get the information about the users who bookmarked this url, we classify them into our 3 groups. Finally, we calculate the average use of tags for every group and determine the total number of times a tag has been used in every group.

Finally, we use this information to plot the tag usage in every group. We build a bar chart using Flot and a Tag Cloud using Google Visualization API. Also, we provide a graph for the top-10 most used tags for that url.

Caveats: The number of bookmarks a user has created is just one way of classifying his or her level of expertise. There may be other other variables that could lead to different results and conclusions. However, with this specific classification we noticed that "experts" tend to use more tags than "novices". We also noticed that their tags tend to be both abstract and specific as opposed to "novices" who tend to use more specific tags.

Food for thought... What conclusions can you draw from these graphs (if any)?

i202 Fall 2010 School of Information, UC Berkeley

Navigation

User login

The vocabulary problem in the wild