UC Berkeley School of Information

I 240: Information Retrieval

Textbooks and Readings

(See also Links)

Required:

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze. Introduction to Information Retrieval. Cambridge University Press, 2008. (Also preprint version available online at http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html)

Karen Sparck Jones and Peter Willett. Readings in Information Retrieval. San Francisco : Morgan Kaufmann, 1997 (ISBN 1-55860-454-5) Highly Recommended - there will be readings from this. Parts available through Google Books

Optional:

David A. Grossman and Ophir Frieder. Information Retrieval: Algorithms and Heuristics. Second Edition. Dordrecht, The Netherlands: Springer, 2004 (ISBN 1-4020-3004-5).

Baeza-Yates and Ribeiro-Neto. Modern Information Retrieval, Addison Wesley, 1999.

C. J. van Rijsbergen. Information retrieval. London : Butterworths, 1975. Available through the preceding link in PDF or HTML.

William R. Hersh. Information Retrieval: A Health and Biomedical Perspective. 2nd Edition. Springer-Verlag, 2003; ISBN: 0-387-95522-4

W. Bruce Croft (ed). Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Rerieval. Kluwer Academic Publishers, 2000; ISBN: 0-7923-7812-1.

Ian H. Witten, Alistair Moffat and Timothy C. Bell. _ Managing Gigabytes : Compressing and Indexing Documents and Images. 2nd Edition_ (Morgan Kaufmann Series in Multimedia Information and Systems) Morgan Kaufmann Publishers, 1999; ISBN: 1558605703

William B. Frakes and Ricardo Baeza-Yates. _Information retrieval: data structures & algorithms_. Englewood Cliffs, N.J. : Prentice Hall, 1992.

Gerard Salton. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Reading, Mass. : Addison-Wesley, 1988. [Amazon currently lists this book as "Out of Print--Limited Availability", but it may be available used.]

Charles P. Bourne and Trudi Bellardo Hahn. A History of Online Information Services: 1963-1976. The MIT Press, 2003; ISBN: 0-262-02538-8. For those interested in the early history of online IR services.

Additional Readings:

These are readings for background and discussion. Most of these will be assigned for class discussion, others are for those who wish to dig further into particular subjects. The list is based on the textbook of readings on IR by Peter Willett and Karen Sparck Jones, with some additional items (in case you want to try to hunt down the individual papers in the readings for the course).

In addition, a digital library of early IR report and book literature is being made available through SIGIR at http://www.sigir.org/museum/.

* HISTORICAL: These items cover some early ideas and implementations that provide some of the foundations of information retrieval theory and practice.
LUHN57
Luhn, H.P. (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information.IBM Journal of Research and Development, 1, 309-317.
FAIR58
Fairthorne, R.A. (1958). Automatic Retrieval of Recorded Information. _Computer Journal, 1, 36-41. (Also in Fairthorne, R.A. (1961).Towards information retrieval_. London: Butterworths).
JOYC58
Joyce, T. and Needham, R.M. (1958). The thesaurus approach to information retrieval.American Documentation, 9 (3), 192-197.
LUHN61
Luhn, H.P. (1961). The automatic derivation of information retrieval encodements from machine-readable texts.Information retrieval and machine translation (Ed A. Kent), Vol 3, Pt 2, 1021-1028; reprinted in C.K. Schultz, Ed,H.P. Luhn: Pioneer of information science, New York: Spartan Books, 1968,
MARO60
Maron, M.E. and Kuhns, J.L. (1960). On relevance, probabilistic indexing and information retrieval.Journal of the Association for Computing Machinery, 7, 216-244.
MARO61
Maron, M.E. (1961). Automatic indexing: an experimental inquiry.Journal of the Association for Computing Machinery, 8, 404-417.
DOYL62
Doyle, L.B. (1962).Indexing and abstracting by association. Part 1. SP-718/001/00, System Development Corporation, Santa Monica CA.
MARO65
Maron, M.E. (1965). Mechanised documentation: the logic behind a probabilistic interpretation.Statistical methods for mechanised documentation_ (Ed M.E. Stevens, V.E. Giuliano and L.B. Heilprin), National Bureau of Standards Miscellaneous Publication 269, Washington DC: US Government Printing Office, 9-13.
CLEV67
Cleverdon, C.W. (1967). The Cranfield tests on index language devices. _Aslib Proceedings, 19, 1967, 173-192.
SALT68
Salton, G, and Lesk, M.E. (1968). Computer evaluation of indexing and text processing.Journal of the ACM, 15 (1), 8-36; reprinted in G. Salton, Ed,The SMART retrieval system, Englewood Cliffs NJ: Prentice-Hall, 1971, 143-180.

* KEY CONCEPTS: These papers examine the nature of documents, aboutness, indexing and index languages, requests, relevance, users and searching. Note this section deals with these topics primarily in an analytical and descriptive style, rather than by wholesale modelling of the retrieval process, covered in a later section.
HUTC78
Hutchins, W.J. (1978). The concept of `aboutness' in subject indexing. _Aslib Proceedings, 30. 172-181.
CLEV63
Cleverdon, C.W. and Mills, J. (1963). The testing of index language devices.Aslib Proceedings, 15 (4), 106-130; reprinted in L.M. Chan, P.A. Richmond and E. Svenonius, Eds,Theory of Subject Analysis, Littleton CO: Libraries Unlimited, 1986, 223-246.
FOSK80
Foskett, D.J. (1980). Thesaurus. in A. Kent. H. Lancour and J.E. Daily, Eds,Encyclopedia of Library and Information Science, Vol 30, New York: Marcel Dekker, 416-462; reprinted in E.D. Dym, Ed,Subject and information analysis, New York: Marcel Dekker, 1985, 270-316.
DANI85
Daniels, P.J., Brooks, H.M. and Belkin, N.J. (1985). Using problem structures for driving human-computer dialogues.RIAO-85, Actes: Recherche d'Informations Assistee par Ordinateur, Grenoble: IMAG, 645-660.
SARA75
Saracevic, T. (1975). Relevance: a review of and a framework for the thiniking on the notion in information science.Journal of the American Society for Information Science, 39 (3) 321-343.

* EVALUATION: These papers cover the notions of performance issues, criteria for performance evaluation, test design and methodology, with examples illustrating the methods.
SARA88
Saracevic, T. et al (1988). A study of information seeking and retrieving, Parts 1,2,3.Journal of the American Society for Information Science, 39 (3), 161-216. Pt 1 only
COOP73
Cooper, W.S. (1973). On selecting a measure of retrieval effectiveness. Pt 1.Journal of the American Society for Information Science, 24 (?2), 87-100.
TAGU92
Tague-Sutcliffe, J. (1992). The pragmatics of information retrieval experimentation, revisited.Information Processing and Management, 28 (4), 467-490.
KEEN92
Keen, E.M. (1992). Presenting results if experimental retrieval comparisons. _Information Processing and Management, 28 (4), 491-502.
LANC69
Lancaster, W.F. (1969). MEDLARS: Report on the evaluation of its operating efficiency.American Documentation, 20 (2), 119-142; reprinted in T. Saracevic, Ed,Introduction to Information Science, New York: Bowker, 1970, 640-664.
BLAI85
Blair, D.C. and Maron. M.E. (1985). An evaluation of retrieval effectiveness for a full-text document retrieval system.Communications of the ACM, 28 (??), 289-299.
SALT86
Salton, G. (1986). Another look at text-retrieval systems.Communications of the ACM, 29(7), 648-656.
BLAI90
Blair, D.C. and Maron, M.E. (1990). Full text information retrieval: further analysis and clarification.Information Processing and Management, 26, 437-447.
BLAI96
Blair, D.C. (1996). STAIRS redux: thoughts on the STAIRS evaluations, ten years after.Journal of the American Society for Information Science, 47, 4-22.
HARM95
Harman, D. (1995). The TREC Conferences.Hypertext - information retrieval - multimedia: synergieeffekte elektronischer informationssysteme, HIM '95, Proceedings (Ed R. Kuhlen and M. Rittberger), Konstanz: Universitaetsforlag Konstanz, 9-28.

* BASIC IR MODELS: These papers cover models of IR, both qualitative and quantitative (eg cognitive, statistical), concentrating on the general notions of the main IR models. Implementation issues are described later in Techniques.
ROBE77b
Robertson, S.E. (1977). Theories and models in information retrieval. _Journal of Documentation, 33, 126-148.
BELK82
Belkin, N.J., Oddy, R.N. and Brooks, H.M. (1982). ASK for information retrieval: part 1. Background and theory.Journal of Documentation, 38, 61-71.
COOP88
Cooper, W.S. (1988). Getting beyond Boole.Information Processing and Management, 24, 243-248.
ROBE77b
Robertson, S.E. (1977). The probability ranking principle in IR.Journal of Documentation, 33, 294-304.
SALT75
Salton, G. Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing.Communications of the ACM, 18 (11), 613-620.
ROBE82
Robertson, S.E., Maron, M.E. and Cooper, W.S. (1982). Probability of Relevance: A Unification of Two Competing Models for Document Retrieval. _Information Technology: Research and Development, 1, 1-21.
TURT90
Turtle, H.R. and Croft, W.B. (1990). Inference networks for document retrieval.Proceedings of the 13th International Conference on Research and Development in Information Retrieval, 1-24, 1990.
VANR86
van Rijsbergen, C.J. (1986). A non-classical logic for information retrieval.Computer Journal, 29, 481-485, 1986.

*IR TECHNIQUES: These papers examine the details of various models and other specific techniques and technologies, including reports of testing.
BELK87
Belkin, N.J. and Croft, W.B. (1987). Retrieval Techniques.Annual Review of Information Science and Technology, 22, 109-145.
ROBE76
Robertson, S.E. and Sparck Jones, K. (1976). Relevance Weighting of Search Terms.Journal of the American Society for Information Science, 27(3), 129-146.
CROF79
Croft, W.B. and Harper, D.J. (1979). Using probabilistic models of document retrieval without relevance information.Journal of Documentation, 35, 285-295.
PORT80
Porter, M.F. (1980). An algorithm for suffix stripping.Program, 14, 130-137.
ROBE94
Robertson, S.E. and Walker, S. (1994). Some simple effective approximations to the 2 Poisson model for probabilistic weighted retrieval.SIGIR 94 - Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 232-241.
SALT88
Salton, G. and Buckley, C. (1988). Term weighting approaches in automatic text retrieval.Information Processing and Management, 24, 513-523.
SALT90
Salton, G. and Buckley, C. (1990). Improving retrieval performance by relevance feedback,Journal of the American Society for Information Science, 41, 288-297, 1990.
SPAR79
Sparck Jones, K. (1979). Search term relevance weighting given little relevance information.Journal of Documentation, 35 (1), 30-48.
STRZ94
Strzalkowski, T. (1994) Robust text processing in automated information retrieval.Proceedings of the 4th Conference on Applied Natural Language Processing (stuttgart), Association for Computational Lingustics, 168-173.
GRIF86
Griffiths, A., Luckhurst, H.C. and Willett, P. (1986). Using interdocument similarity information in document retrieval systems.Journal of the American Society for Information Science, 37 (1), 3-11.
BELK92
Belkin, N.J. and Croft, W.B. (1992). Information filtering and information retrieval: two sides of the same coin?Communications of the ACM, 35(12), 29-38.

* SYSTEMS: This section includes papers describing complete IR systems, focussing on those embodying modern views of what such systems should be like, but also illustrating the status of more `conventional' systems.
SALT83
Salton, G. and McGill, M.J. (1983). The SMART and SIRE experimental retrieval systems.In Introduction To Information Retrieval, New York, McGraw-Hill, pp 118-156.
HARM92
Harman, D. (1992). User-friendly systems instead of user-friendly front-ends.Journal of the American Society for Information Science, 43 (?), 164-174.
WALK89
Walker, S. (1989). The Okapi online catalogue research projects.in The online catalogue: developments and directions (Ed C. Hildreth), London: The Library Association, 84-106.
CALL95
Callan, J.; Croft, W.B. and Broglio, J. (1995). TREC and TIPSTER experiments with INQUERY.Information Processing and Management, 31 (3).
FOX87
Fox, E.A. and France, R.K. (1987). Architecture of an expert system for composite document analysis, representation and retrieval.Journal of Approximate Reasoning, 1, 151-175.
FOX88
Fox, E.A. and Koll, M.B. (1988). Practical enhanced Boolean retrieval: experiences with the SMART and SIRE systems.Information Processing and Management, 24, 257-267.
MCCU85
McCune, B.P., Tong, R. and Dean, J. (1985). RUBRIC, a system for rule-based information retrieval.IEEE Transactions on Software Engineering. SE11-9, 939-944.
JACO90
Jacobs, P.S. and Rau, L.F. (1990). SCISOR: extracting information from on-line news.Communications of the ACM, 33(11), 88-97.
LARS96
Larson, R.R., McDonough, J., Kuntz, L., O'Leary, P. and Moon, R. ``Cheshire II: Designing a Next-Generation Online Catalog.''Journal of the American Society for Information Science, 47(7) (July 1996), p. 555-567.
TENO94
Tenopir, C. and Cahn, P. (1994). TARGET and FREESTYLE: DIALOG and Mead join the relevance ranks.Online, 18 (3), 31-47. (shorter after ads deleted)

* EXTENSIONS: These papers move outwards from the classical text document/single query situation to consider other types of `document' and other versions and aspects of the information access task. The object is to illustrate the scope of information retrieval viewed more broadly, and to draw attention to the links between retrieval and other information processing activities. At the same time, since some of the ideas and work covered here also reflect new challenges and possibilities stemming from recent technology developments, this section has papers to be taken as initial leads into the future, rather than as authoritative guides to the established wisdom.
LARS88
``Hypertext and Information Retrieval: Towards the Next Generation of Information Systems''. In: Borgman, C. L. and Pai, E. Y. H. (Eds.)Information and Technology: Proceedings of the 51st ASIS Annual Meeting, Medford, NJ: Learned Information, Inc., 1988.
AGOS92
Agosti, M. Gradenigo, G. and Marchetti, P.G. (1992). A hypertext environment for interacting with large databasesInformation Processing and Management, 28 93), 371-387.
SALT94
Salton, G., Allan, J., Buckley, C. and Singhal, A. (1994). Automatic analysis, theme generation, and summarisation of machine-readable texts. _Science, 264, 3 June, 1421-1426.
HULL96
Hull, D.A. and Grefenstette, G. (1996). Experiments in multilingual retrieval.Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
ROSE91
Rose, R.C. (1991). Techniques for information retrieval from speech messages.Lincoln Laboratory Journal, 4 (1), 45-59.
ZHAN95
Zhang, H.J., Low, C.Y., Smoliar, S.W. and Wu, J.H. (1995). Video parsing, retrieval and browsing: an integrated and content-based solution.Proceedings of ACM Multimedia '95, 15-24; reprinted inIntelligent multimedia information retrieval (Ed M. Maybury).
BIEB88
Biebricher, B. et al (1988). The automatic indexing system AIR/PHYS - from research to application.Eleventh International Conference on Research and Development in Information Retrieval, 333-342.
HAYE88
Hayes, P.J., Knecht, L. and Cellio, M. (1988). A news story categorisation system.Proceedings of the Second Conference on Applied Natural Language Processing, Association for Computational Linguistics, 9-17.
RAU88
Rau, L.F. (1988). Conceptual information extraction and retrieval from natural language input.RIAO 88, 424-437.
MARS84
Marsh, E., Hamburger, H. and Grishman, R. (1984). A production rule system for message summarisation.AAAI-84, Proceedings, American Association for Artificial Intelligence, 243-246.
JOHN93
Johnson, F.C., Paice, C.D., Black, W.J. and Neal, A.P. (1993). The application of linguistic processing to automatic abstract generation. _Journal of Document and Text Management, 1 (3), 215-241.
* ADDITIONAL ITEMS
SWAN88
Swanson, D.R. (1988). Historical note: information retrieval and the future of an illusion.Journal of the American Society for Information Science, 39 (2), 92-98.

Handouts and Referenced in Lectures:

SING96
Singhal, A., Buckley, C. and Mitra, M. (1996). Pivoted Document Length Normalization. In SIGIR '96, pp. 21-29.

RAGH86
Raghavan, V.V. and Wong S.K.M. (1986). A Critical Analysis of the Vector Space Model for Information Retrieval. Journal of the American Society for Information Science. 37(5), pp. 279-287.
SALT91
Salton, G. (1991). Developments in Automatic Text Retrieval. Science, 253 (30 Aug 1991), pp. 974-980.
COOP92
Cooper, W.S., Gey, F.C. and Dabney, D.P. (1992). Probabilistic Retrieval Based on Staged Logistic Regression. In: SIGIR '92, pp. 198-210.
PONT98
Ponte, J.M. and Croft W.B. (1998). A Language Modelling Approach to Information Retrieval. In: SIGIR '98, pp. 275-281.
FROE94
Froelich, Thomas J. (1994). Relevance Reconsidered -- Towards an agenda for the 21st Century: Introduction to the special issue on Relevance Research. Journal of the American Society for Information Science, 45(3) (April 1994), pp.124-134.
SCHA90
Schamber, Linda, Eisenberg, Michael B. and Nilan, Michael S. (1990) A Re-Examination of Relevance: Toward a Dynamic Situational Definition. Information Processing and Management, 26(6), pp. 755-776.