go to
UC Berkeley home page go to SIMS home page

Supplemental Readings for SIMS202
Fall 2000
Prof. Hearst and Prof. Larson

This reader provides supplemental material for SIMS202 (Information Organization and Retrieval). The course textbooks are The Organization of Information by Arlene Taylor, Libraries Unlimited, 2000, and Modern Information Retrieval by Baeza-Yates and Ribeiro-Neto (Eds.), Addison Wesley, 1999.


Information Retrieval

This explanation of IR basics is just another way of covering the first few chapters of Modern Information Retrieval. It is more compact and more straightforward.

Daniel Jurafsky and James H. Martin, Word Sense Disambiguation and Information Retrieval, Chapter 17 of Speech and Language Processing, Prentice Hall 2000.
IR Ranking and Systems

William Cooper, Getting Beyond Boole, Information Processing and Management, 24, 23-248, 1988.

Marti Hearst, Improving Full-Text Precision on Short Queries using Simple Constraints, Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), Las Vegas, NV, April 1996.

Ray R. Larson, et al., Cheshire II: Designing a Next-Generation Online Catalog, in Journal of the American Society of Information Science, 47(7), pp. 555-567, 1996.

Web search and crawling:
Allan Heydon, Marc Najork, Mercator: A Scalable, Extensible Web Crawler. World Wide Web 2(4): 219-229, 1999.

Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, in the Proceedings of WWW7 / Computer Networks 30(1-7): 107-117, 1998.

IR Evaluation:

David C. Blair and M. E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System, in Communications of the ACM, 28(3), 1985.

Jurgen Koenemann and Nicholas J. Belkin, A Case for Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness, in the Proceedings of ACM/CHI, Vancouver, CA, pp. 205-212, 1996.

Searcher Behavior:

Marcia J. Bates, The design of browsing and berrypicking techniques for the on-line search interface, Online Review, 13 (5), 407-431, 1989.

Vicki L. O'Day and Robin Jeffries, Orienteering in an Information Landscape: How Information Seekers Get From Here to There, in Proceedings of ACM InterCHI '93, pp. 438-445, 1993.

Daniel M. Russell et al., The Cost Structure of Sensemaking, in the Proceedings of ACM/InterCHI '93, pp. 269-276, April 1993.

Collaborative Filtering:

Joseph A. Konstan et al., GroupLens: Applying Collaborative Filtering to Usenet News, in Communications of the ACM, 40 (3) pp. 77-87, March 1997.

Upendra Shardanand and Pattie Maes, Social Information Filtering: Algorithms for Automating ``Word of Mouth'', in the Proceedings of ACM/CHI, pp. 210-217, Denver, CO, May 1995.

Classification and Categorization

Supplemental reading on cognitive aspects of categorization.

John H. Holland, Keith J. Holyoak, Richard E. Nisbett, and Paul R. Thagard, Category Formation, Chapter 6 of Induction: Processes of Inference, Learning, and Discovery, MIT Press, 1986, 1989.

This article contrasts faceted and hierarchical classifications, and subject headings vs. category codes. This and the two following papers also address the use of controlled vocabulary in search.

Marcia Bates, How to Use Controlled Vocabularies more Effectively in Online Searching, Online, November 1988, 45-56.

Elaine Svenonius, Unanswered Questions in the Design of Controlled Vocabularies, in Journal of the American Society of Information Science, 37 (5), pp. 331-340, 1986.

An introduction to WordNet, a lexical thesaurus.

Christiane Fellbaum (Ed.), WordNet : an electronic lexical database, MIT Press, 1998. (Introduction and Chapter 1.)

An introduction to XML and DTDs. (This will appear in the supplementary reader.)

Natanya Pitts-Moultis and Cheryl Kirk, XML Black Book, The Coriolis Group, 1999. (Chapter 5.)

Information Design

These articles discuss practice and behavior surrounding information design.

Lucy M. Berlin, Robin Jeffries, Vicki L. O'Day, Andreas Paepcke and Cathleen Wharton Where did you put it? Issues in the design and use of a group memory, Conference on Human Factors and Computing Systems April 24 - 29, 1993, Amsterdam The Netherlands.

Mark W. Newman and James A Landay. Sitemaps, Storyboards, and Specifications: A Sketch of Web Site Design Practice. In the Proceedings of Designing Interactive Systems, DIS 2000, New York City, 2000.

The following articles descibe information design methodology applied to four tasks: web site design, product design, database design, and thesaurus design.
Darrell Sano, Designing large-scale web sites: a visual design methodology John Wiley, 1996. (Chapter 3)

(This will appear in the supplementary reader.)
Hauser, J. R., Clausing, D. The House of Quality. Harvard Business Review, 66 (May-June), 63-73, 1988.

(This will appear in the supplementary reader.)
Toby J. Teorey, Database Modeling and Design, Third Edition. Morgan Kaufmann Publishers, Inc. 1999. (Chapters 1, 2, 3.0-3.3, 4, 5.0-5.2)

Dagobert Soergel, Indexing Languages and Thesauri: Construction and Maintenance, Melville Publishing Company, 1974. (Chapter F.)