SIMS 141:   Search Engines: Technology, Society, and Business

   Speaker Schedule, Fall 2005
Image from
  http://www.newsfactor.com/story.xhtml?story_id=26541

Search Engines: Technology, Society, and Business
SIMS 141

Assignment 1

Due October 6 at 5pm PST. (We changed this from Oct 5.)
Please do your own work.
File format: we prefer plain text files or pdf files; however, you may use Word if these other options are not viable.

(It turns out that the bspace facility doesn't work well for large classes.)

  1. People sometimes say things like the following: "Google has lots of information about X," where X is some topic of interest. Describe in a few sentences why this statement reflects a fundamental misconception.

  2. (a) Describe in one paragraph how web search engines rank search results, that is, how they decide what order to place the web pages that match the users' query terms. Describe this in general terms and do not try to explain the exact method used by a particular engine.

    (b) The link below shows a visualization for comparing the search results produced by two search engines. (I've started it off with a sample query; it sometimes takes a few seconds to load.) The dots that are filled in with color are those pages that are found by both search engines; the empty dots are those pages that were found by only one of the two search engines. Each hit is shown in order of its ranking (left to right signifies top to bottom on a search page). The connecting lines signify which pages are found by both engines, and their relative rankings.
    http://www.langreiter.com/exec/yahoo-vs-google.html?q=berkeley

    Name one reason why two search engines might rank the same page differently, and one reason why they might not retrieve the same pages for a given query.

    (c) Compare the results of the two queries below using this visualization (I've saved screenshots of these in case the visualization doesn't work for you): Given what you know about ranking, why do you think the patterns are so different for these two queries?

  3. Consider the vocabulary problem that Dr. Daniel Rose mentioned. In one paragraph, describe in what way anchor text is helpful to search engines for dealing with this problem?

  4. Dr. Pedersen and Dr. Rose both talked about difficulties associated with evaluating the effectiveness of search engine behavior. Discuss two reasons (one paragraph each) why it is difficult for web search engine developers to evaluate changes in their ranking algorithms.

  5. Below are shown a few queries taken from the query log of a UC Berkeley search engine (this was developed 6 years ago by Prof. Hearst and one of her masters students, cha-cha.berkeley.edu). The queries were issued in September, 2005. Try to classify these queries according to the breakdown described in Table 1 in the paper by Rose and Levinson. Describe why you assigned each class and describe any problems you had doing this.

    Sample queries:
    • bike registration
    • david wessel
    • tele-bears
    • criminology
    • student health insurance
    • dice game

  6. What does John Battelle mean by The Database of Intentions? Do you find this to be a compelling idea? Why or why not? (Three paragraphs.)