The exam is comprehensive, meaning it will cover both parts of the class. However, the emphasis will be on materials covered in the second half of the course.
Each question will be worth an indicated number of points. Partial credit will be awarded. In your answers, please balance conciseness with illustration of all of the requested information. In other words, don't write a lot of things that aren't asked for, but try to address all of what is asked for.
To study for the exam,
Below are shown the major topics we've covered in the course, and some example questions. Please note that these are examples of the types of questions we will ask. The second half of the course will be emphasized in the exam, but there will also be questions related to materials from the first half. The example questions are (probably) not the exact questions we will ask. We will probably ask some other types of questions too, in particular the kind where we give you an example of some information and ask you to do something with it (design an ER diagram, convert to a hierarchy, etc.)
What is the information life cycle?
What are different ways of measuring information? What are different ways of defining information?
What are the motivations behind creating and using metadata systems like Dublin Core, MARC, AACR II, etc?
What is the purpose of authority control? Is this a type of controlled vocabulary? Why or why not?
What is a DTD? How do you create an XML representation of a print magazine?
What does Svenonius consider to be the primary difficulties with using controlled vocabularies?
What are the differences between how hierarchical and faceted category structures are typically used in computer interfaces. Illustrate with examples we've discussed in class or homework.
What is the relationship between attribute/value distinctions and category structure decisions?
How is a classification scheme or a thesaurus designed?
What is the role of family resemblance of attributes in the human category systems?
What are superordinate and subordinate categories in the human category system?
How is a database different than a file system?
What are the benefits of a database system?
What do we mean by data independence?
What are the benefits/drawbacks of the primary database models?
Entity-Relationship Diagrams -- what are they for, how do you create them?
How do you normalize a relational model database?
What is a join?
What is the significance of Zipf's law for weighting of terms in information retrieval?
What kinds of errors can a stemming algorithm produce?
How is polysemy different than synonymy?
Create an example that illustrates the difference between symbols and meaning, and shows their correspondence to one another.
What is the difference between a search engine that uses the vector space ranking algorithm on natural language queries and a system that uses Boolean queries?
What is the role of coordination level ranking in a faceted Boolean system?
Describe the following information need in terms
of a faceted Boolean query. What kinds of weighting algorithms can be applied
to a faceted query like this?
``I would like to find articles about the effects
of the passage of the independent investigator statute by Congress on how
the U.S. president chooses an attorney general.''
Why do different web search engines return different sets of documents for the same query?
Redo the computations of Assignment 10 part 3 using different values for TF.
Draw and label a diagram that shows the major components of an IR system.
What are the special features of the Cheshire II information access system?
What is the purpose of an inverted index? How is it used to generate answers to Boolean queries?
Convert the contents of a set of documents into an inverted index representation.
Define precision. Define recall. Define relevance. How are the three interrelated?
Under what circumstances is high recall desirable? Under what circumstances is high precision?
What is the main purpose of TREC? How does it differ from earlier evaluation efforts?
Search and retrieval is part of a larger process. Name some other components of that process.
How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?
How (fundamentally) does search on a system like Yahoo differ from search on Altavista or Google?
Name the search modes discussed in the O'Day and Jeffries paper. What kinds of triggers did they find caused transitions from one search strategy to another?
Compare and contrast the current approaches to providing user interfaces for overviews of document collections.
What is the purpose of the TileBars graphical user interface? What are its strengths and weaknesses?
Compare and contrast the attempts that have been made to provide user interfaces to searching text collections in which the documents have been assigned large category hierarchies.
Practice design question: (Note on practice question:
similar questions have been asked on exams in the past.)
Consider the DLITE interface for search (see page
314ff. in Modern IR). Name a type of search task, and design DLITE workspace
that would support this task. Describe the functionality it does and does
not support and sketch a storyboard of a user completing two tasks using
your design. Justify your decisions.
What is main the difference between relevance feedback as defined in the literature and the more current web-based notion of "more like this"?
Given a query, three documents marked as relevant, and the Rocchio formula for relevance feedback given in class, compute the vector for the new query that results.
The Koenemann & Belkin study found results in three conditions for relevance feedback: opaque, transparent, and penetrable. Consider the different ways people have recently implemented systems for predicting which web page to show the user next. How do the differences in these systems correspond to the different relevance feedback conditions in the K&B study?
What are the major steps in web site design?
How does information architecture differ from navigation structure?
How is the database design process similar to/different from the web site design process?
Why are sketches used by professional designers?