Supplemental Readings for IS202, Part I
Fall 1997
Prof. Hearst and Prof. Larson

This reader provides supplemental material for the first eight weeks of IS202 (information organization). Another reader (Part II) will be available soon that will contain materials to supplement the last eight weeks of the course, and will focus on information search and retrieval.

The following material is provided to give you more background on the database material that was covered in the weeks 2, 3, and 4 of the course. Chapter 8 discusses SQL, which will be covered in the second half of the course.

Modern Database Management, Fourth Edition, Fred McFadden and Jeffrey Hoffer, Benjamin Cummings Publishers, 1994.
Chapter 4, pages 123-146, 153-162
Chapter 6, pages 199-222
Chapter 8, pages 296-313
The following material provides supplemental reading on cognitive aspects of categorization presented in Lecture 10.
H. Clark and E. Clark, Psychology and Language: An Introduction to Psycholinguistics. Harcourt, Brace, Javanovich Publishers, 1977. Excerpts: pages 462-468, 523-530, 552-554.

George Lakoff, Women, Fire, and Dangerous Things, University of Chicago Press, 1987. Excerpts: pages 31-57.

The following article contrasts faceted and hierarchical classifications, and Subject Headings vs. Category Codes, and is meant to supplement the lectures of week 6 (lectures 11 and 12). It also addresses the use of controlled vocabulary in search, which will be revisited later in the course.
Marcia Bates, How to Use Controlled Vocabularies More Effectively Online Searching, Online, November 1988, 45-56.
The following chapter is supplementary material to lecture 12 on how to build a thesaurus.
Dagobert Soergel, Indexing Languages and Thesauri:  Construction and Maintenance, Melville Publishing Company, 1974. Chapter F, pages 325-345.
The following excerpts pertain to Automatic Content Analysis. They cover the basic text analysis steps needed for content analysis and information retrieval, including automatic thesaurus generation and document clustering, and introduce the vector space model.
Gerald Salton, Automatic Text Processing, Addison Wesley,  1989. Chapter 9, ``Automatic Indexing'', pages 294-309. Chapter 10, ``Advanced Information-Retrieval Methods,'' pages 313-345.
The following papers describe techniques for automatic thesaurus generation, text summarization, categorization, and information extraction. The first three use mainly statistical techniques, while the last two use robust linguistic and natural language processing techniques.
Kenneth W. Church and Patrick Hanks, Word Association Norms, Mutual Information, and Lexicography, Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1989, pages 76-83.

Yaacov Choueka, Looking for Needles in a Haystack, or Locating Interesting Collocational Expressions in Large Textual Databases, Proceedings of the RIAO, MIT, 1988, pages 609-623

Hinrich Schutze, David Hull, and Jan O. Pedersen, A Comparison of Classifiers and Document Representations for the Routing Problem, Proceedings of ACM SIGIR'95, Seattle, USA, July 1995.

Paul S. Jacobs and Lisa F. Rau, Innovations in text interpretation, Artificial Intelligence, 63, pages 143-191.

F. C. Johnson, C. D. Paice, W.J. Black, A.P. Neal, The application of linguistic processing to automatic abstract generation, Journal of Document and Text Management, 1, 215-241.