SIMS 202 Information Organization and Retrieval Midterm Exam Preparation Guide The exam will be handed out on
Thursday October 19 at the end of class, and will be due Tuesday
October 24 at the beginning of class. This will be an open-book, open-note exam.
Each person must work individually. This means you cannot
discuss the exam with anyone except for Marti, Ray, or Jennifer.
To study for the exam,
Below are shown the major topics we've
covered so since the midterm and some example questions. Please note that
these are examples of the types of questions we will ask. They are
(probably) not the exact questions we will ask. Furthermore, we
will probably ask some other types of questions too, in particular the kind
where we give you an example of some information and ask you to do something
with it. · Topic: Document Representation and Statistical
Properties of Text · Example Questions: What is the
significance of Zipf's law for weighting of terms in information retrieval? What kinds of
errors can a stemming algorithm produce? · Topic: Queries, Ranking, and the Vector Space
Model · Example Questions: What is the
difference between a search engine that uses the vector space ranking
algorithm on natural language queries and a system that uses Boolean queries? What is the role
of coordination level ranking in a faceted Boolean system? Describe the
following information need in terms of a faceted Boolean query. What kinds of
weighting algorithms can be applied to a faceted query like this? ``I
would like to find articles about the effects of the passage of the
independent investigator statute by Congress on how the U.S. president chooses
an attorney general.'' Why do different
web search engines return different sets of documents for the same query? Redo the
computations of Assignment 4 part 3 using different values for TF. · Topic: IR systems and Implementation · Example Questions: Draw and label a
diagram that shows the major components of an IR system. What are the
special features of the Cheshire II information access system? What is the
purpose of an inverted index? How is it used to generate answers to Boolean
queries? Convert the contents of a set of documents into an inverted index
representation. · Topic: Evaluation of IR
Systems · Example Questions: Define
precision. Define recall. Define relevance. How are the three interrelated? Under what
circumstances is high recall desirable? Under what circumstances is high
precision? What is the main
purpose of TREC? How does it differ from earlier evaluation efforts? · Topic: The Search Process
and User Interfaces · Example Questions: Search and
retrieval is part of a larger process. Name some other components of that
process. How/why doesn't
the Bates berry-picking model fit with the standard information retrieval
model? How
(fundamentally) does search on a system like Yahoo or Looksmart differ from
search on Altavista or Hotbot? Name the search
modes discussed in the O'Day and Jeffries paper. What kinds of triggers did
they find caused transitions from one search strategy to another? Compare and
contrast the current approaches to providing user interfaces for overviews of
document collections. What is the
purpose of the TileBars graphical user interface? What are its strengths and
weaknesses? Compare and
contrast the attempts that have been made to provide user interfaces to
searching text collections in which the documents have been assigned large
category hierarchies. Practice design
question: (Note on practice question: we may ask a similar question on the
exam.) Consider the DLITE interface for search. Name a type of search task,
and design DLITE workspace that would support this task. Describe the
functionality it does and does not support and sketch a storyboard of a user
completing two tasks using your design. Justify your decisions. · Topic: Relevance Feedback · Example Questions: What is main the
difference between relevance feedback as defined in the literature and the
more current web-based notion of "more like this"? Given a query,
three documents marked as relevant, and the Rocchio formula for relevance
feedback given in class, compute the vector for the new query that results. The Koenemann
& Belkin study found results in three conditions for relevance feedback:
opaque, transparent, and penetrable. Consider the different ways people have
recently implemented systems for predicting which web page to show the user
next. How do the differences in these systems correspond to the different
relevance feedback conditions in the K&B study? |