SIMS 202 Information Organization and Retrieval 

Final Exam Preparation Guide, Fall 2003

The SIMS 202 final exam will take place Friday December 8 from 9:30am-12:30pm in 202 South Hall.  This will be an open-book, open-note and open-computer exam, you may use your own laptop, or one of the machines in the computer lab. You can also
write the exam by hand if you wish, so please bring your own pen/pencils (It's OK if your handwriting isn't great). We'll supply the paper as part of the exam itself. Each person will work individually. The exam period is three hours; we'll try to design it to require less than 2.5 hours, but be prepared to work quickly. If you use the network -accessed material for any part of the exam be sure to cite your sources.

The exam is comprehensive, meaning it will cover both parts of the class. However, the emphasis will be on materials covered in the second half of the course.

Each question will be worth an indicated number of points. Partial credit will be awarded. In your answers, please balance conciseness with illustration of all of the requested information. (In other words, don't write a lot of things that aren't asked for, but try to address all of what is asked for.)

To study for the exam,

  • Be sure you understand the material that was covered in lecture and have read and absorbed the corresponding material in the readings.
  • Be sure you can do activities similar to what was done in the homework assignments.
  • We will have questions that require you to generalize from what you've learned and synthesize ideas. So be sure you have thought about the ideas covered in lecture, readings, and homework assignments.

These ideas and abilities should be at your fingertips. There won't be time during the exam to do a lot of catch-up reading on topics you haven't studied.

Below are shown the major topics we've covered in the course, and some example questions. Please note that these are examples of the types of questions we will ask. The second half of the course will be emphasized in the exam, but there will also be questions related to materials from the first half. The example questions are (probably) not the exact questions we will ask. We will probably ask some other types of questions too, in particular the kind where we give you an example of some information and ask you to do something with it (design an ER diagram, convert to a hierarchy, etc.)

Information Organization Topics and Example Questions (1st half)

  • Topic: Information
  • Example Questions:

What is the information life cycle?

What are different ways of measuring information? What are different ways of defining information?

What is a DTD? How do you create an XML representation of a print magazine?

  • Topic: Classification/Category Design
  • Example Questions:

What does Svenonius consider to be the primary difficulties with using controlled vocabularies?

What are the differences between how hierarchical and faceted category structures are typically used in computer interfaces. Illustrate with examples we've discussed in class or homework.

What is the relationship between attribute/value distinctions and category structure decisions?

How is a classification scheme or a thesaurus designed?

  • Topic: Human Category Structure
  • Example Questions:

What is the role of family resemblance of attributes in the human category systems?

What are superordinate and subordinate categories in the human category system?

How do theories about human categorization relate to the theory, technology, and practice of information organization and knowledge representation?

  • Topic: Multimedia
  • Example Questions:

What is the “Kuleshov Effect” and how might it affect the design of metadata and information systems for multimedia data?

What is the “semantic gap” and how might it affect the design of metadata and information systems for multimedia data?

  • Topic: Metadata
  • Example Questions:

What are the motivations behind creating and using metadata systems like Dublin Core, MARC, AACR II, etc?

What is the purpose of authority control? Is this a type of controlled vocabulary? Why or why not?

  • Topic: Database Design
  • Example Questions:

How is a database different than a file system?

What are the benefits of a database system?

What do we mean by data independence?

What are the benefits/drawbacks of the primary database models?

Entity-Relationship Diagrams -- what are they for, how do you create them?

How do you normalize a relational model database?

What is a join?

Information Retrieval Topics and Example Questions (2nd half)

  • Topic: Document Representation and Statistical Properties of Text
  • Example Questions:

What is the significance of Zipf's law for weighting of terms in information retrieval?

What kinds of errors can a stemming algorithm produce?

  • Topic: Lexical Relations
  • Example Questions:

How is polysemy different than synonymy?

Create an example that illustrates the difference between symbols and meaning, and shows their correspondence to one another.

  • Topic: Queries, Ranking, and the Vector Space Model
  • Example Questions:

What is the difference between a search engine that uses the vector space ranking algorithm on natural language queries and a system that uses Boolean queries?

What is the role of coordination level ranking in a faceted Boolean system?

Describe the following information need in terms of a faceted Boolean query. What kinds of weighting algorithms can be applied to a faceted query like this?
``I would like to find articles about the effects of the passage of the independent investigator statute by Congress on how the
U.S. president chooses an attorney general.''

Why do different web search engines return different sets of documents for the same query?

Redo the computations of Assignment 10 part 3 using different values for TF.

  • Topic: IR systems and Implementation
  • Example Questions:

Draw and label a diagram that shows the major components of an IR system.

What are the special features of the Cheshire II information access system?

What is the purpose of an inverted index? How is it used to generate answers to Boolean queries?

Convert the contents of a set of documents into an inverted index representation.

  • Topic: Evaluation of IR Systems
  • Example Questions:

Define precision. Define recall. Define relevance. How are the three interrelated?

Under what circumstances is high recall desirable? Under what circumstances is high precision?

What is the main purpose of TREC? How does it differ from earlier evaluation efforts?

  • Topic: The Search Process and User Interfaces
  • Example Questions:

Search and retrieval is part of a larger process. Name some other components of that process.

How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?

How (fundamentally) does search on a directory system like Yahoo differ from search on Altavista or Google?

Name the search modes discussed in the O'Day and Jeffries paper. What kinds of triggers did they find caused transitions from one search strategy to another?

Compare and contrast the current approaches to providing user interfaces for overviews of document collections.

What is the purpose of the TileBars graphical user interface? What are its strengths and weaknesses?

Compare and contrast the attempts that have been made to provide user interfaces to searching text collections in which the documents have been assigned large category hierarchies.

Practice design question: (Note on practice question: similar questions have been asked on exams in the past.)
Consider the DLITE interface for search (see page 314ff. in Modern IR). Name a type of search task, and design DLITE workspace that would support this task. Describe the functionality it does and does not support and sketch a storyboard of a user completing two tasks using your design. Justify your decisions.

  • Topic: Relevance Feedback
  • Example Questions:

What is main the difference between relevance feedback as defined in the literature and the more current web-based notion of "more like this"?

Given a query, three documents marked as relevant, and the Rocchio formula for relevance feedback given in class, compute the vector for the new query that results.

The Koenemann & Belkin study found results in three conditions for relevance feedback: opaque, transparent, and penetrable. Consider the different ways people have recently implemented systems for predicting which web page to show the user next. How do the differences in these systems correspond to the different relevance feedback conditions in the K&B study?

  • Topic: Web Site Design
  • Example Questions:

What are the major steps in web site design?

How does information architecture differ from navigation structure? 

  • Topic: The Design Process
  • Example Questions:

How is the database design process similar to/different from the web site design process?

Why are sketches used by professional designers?