The SIMS 202 final exam will
take place Friday December 8 from
write the exam by hand if you wish, so please bring your own pen/pencils (It's
OK if your handwriting isn't great). We'll supply the paper as part of the exam
itself. Each person will work individually. The exam period is three hours;
we'll try to design it to require less than 2.5 hours, but be prepared to work
quickly. If you use the network -accessed material for any part of the exam be
sure to cite your sources.
The exam is comprehensive, meaning it will cover both parts of the class. However, the emphasis will be on materials covered in the second half of the course.
Each question will be worth an indicated number of points. Partial credit will be awarded. In your answers, please balance conciseness with illustration of all of the requested information. (In other words, don't write a lot of things that aren't asked for, but try to address all of what is asked for.)
To study for the exam,
These ideas and abilities should be at your fingertips. There won't be time during the exam to do a lot of catch-up reading on topics you haven't studied.
Below are shown the major topics we've covered in the course, and some example questions. Please note that these are examples of the types of questions we will ask. The second half of the course will be emphasized in the exam, but there will also be questions related to materials from the first half. The example questions are (probably) not the exact questions we will ask. We will probably ask some other types of questions too, in particular the kind where we give you an example of some information and ask you to do something with it (design an ER diagram, convert to a hierarchy, etc.)
What is the information life cycle?
What are different ways of measuring information? What are different ways of defining information?
What is a DTD? How do you create an XML representation of a print magazine?
What does Svenonius consider to be the primary difficulties with using controlled vocabularies?
What are the differences between how hierarchical and faceted category structures are typically used in computer interfaces. Illustrate with examples we've discussed in class or homework.
What is the relationship between attribute/value distinctions and category structure decisions?
How is a
classification scheme or a thesaurus designed?
What is the role of family resemblance of attributes in the human category systems?
What are superordinate and subordinate categories in the human
category system?
How do theories about human categorization relate to the theory, technology, and practice of information organization and knowledge representation?
What is the “Kuleshov Effect” and how might it affect the design of metadata and information systems for multimedia data?
What is the “semantic
gap” and how might it affect the design of metadata and information systems
for multimedia data?
What are the motivations behind creating and using metadata systems like Dublin Core, MARC, AACR II, etc?
What is the purpose of authority control? Is this a type of controlled vocabulary? Why or why not?
How is a database different than a file system?
What are the benefits of a database system?
What do we mean by data independence?
What are the benefits/drawbacks of the primary database models?
Entity-Relationship Diagrams -- what are they for, how do you create them?
How do you normalize a relational model database?
What is a join?
What is the significance of Zipf's law for weighting of terms in information retrieval?
What kinds of errors can a stemming algorithm produce?
How is polysemy different than synonymy?
Create an example that illustrates the difference between symbols and meaning, and shows their correspondence to one another.
What is the difference between a search engine that uses the vector space ranking algorithm on natural language queries and a system that uses Boolean queries?
What is the role of coordination level ranking in a faceted Boolean system?
Describe the
following information need in terms of a faceted Boolean query. What kinds of
weighting algorithms can be applied to a faceted query like this?
``I would like to find articles about the effects of the passage of the
independent investigator statute by Congress on how the
Why do different web search engines return different sets of documents for the same query?
Redo the computations of Assignment 10 part 3 using different values for TF.
Draw and label a diagram that shows the major components of an IR system.
What are the special features of the Cheshire II information access system?
What is the purpose of an inverted index? How is it used to generate answers to Boolean queries?
Convert the contents of a set of documents into an inverted index representation.
Define precision. Define recall. Define relevance. How are the three interrelated?
Under what circumstances is high recall desirable? Under what circumstances is high precision?
What is the main purpose of TREC? How does it differ from earlier evaluation efforts?
Search and retrieval is part of a larger process. Name some other components of that process.
How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?
How (fundamentally) does search on a directory system like Yahoo differ from search on Altavista or Google?
Name the search modes discussed in the O'Day and Jeffries paper. What kinds of triggers did they find caused transitions from one search strategy to another?
Compare and contrast the current approaches to providing user interfaces for overviews of document collections.
What is the purpose of the TileBars graphical user interface? What are its strengths and weaknesses?
Compare and contrast the attempts that have been made to provide user interfaces to searching text collections in which the documents have been assigned large category hierarchies.
Practice design
question: (Note on practice question: similar questions have been asked on
exams in the past.)
Consider the DLITE interface for search (see page 314ff. in Modern IR). Name a
type of search task, and design DLITE workspace that would support this task.
Describe the functionality it does and does not support and sketch a storyboard
of a user completing two tasks using your design. Justify your decisions.
What is main the
difference between relevance feedback as defined in
the literature and the more current web-based notion of "more like
this"?
Given a query,
three documents marked as relevant, and the Rocchio
formula for relevance feedback given in class, compute the vector for the new
query that results.
The Koenemann & Belkin study
found results in three conditions for relevance feedback: opaque, transparent,
and penetrable. Consider the different ways people have recently implemented
systems for predicting which web page to show the user next. How do the
differences in these systems correspond to the different relevance feedback
conditions in the K&B study?
What are the major
steps in web site design?
How does
information architecture differ from navigation structure?
How is the database
design process similar to/different from the web site design process?
Why are sketches
used by professional designers?