IS 240 Textbooks and Readings
Required Text:
David A. Grossman and Ophir Frieder. Information Retrieval: Algorithms
and Heuristics. Second Edition. Dordrecht, The Netherlands: Springer, 2004
(ISBN 1-4020-3004-5).
Karen Sparck Jones and Peter Willett. Readings
in Information Retrieval. San Francisco : Morgan Kaufmann, 1997 (ISBN
1-55860-454-5)
Highly Recommended - there will be readings from this.
Optional Texts:
Baeza-Yates and Ribeiro-Neto. Modern Information Retrieval,
Addison Wesley, 1999.
C. J. van Rijsbergen. Information retrieval.
London : Butterworths, 1975. Available through the preceding link in PDF or HTML.
William R. Hersh.
Information Retrieval: A Health and Biomedical Perspective. 2nd Edition.
Springer-Verlag, 2003; ISBN: 0-387-95522-4
W. Bruce Croft (ed).
Advances in Information Retrieval: Recent Research from the Center for
Intelligent Information Rerieval. Kluwer Academic Publishers, 2000; ISBN: 0-7923-7812-1.
Ian H. Witten, Alistair Moffat and Timothy C. Bell.
Managing Gigabytes : Compressing and Indexing
Documents and Images. 2nd Edition (Morgan Kaufmann Series in
Multimedia Information and Systems)
Morgan Kaufmann Publishers, 1999; ISBN: 1558605703
William B. Frakes and Ricardo Baeza-Yates. Information retrieval:
data structures & algorithms. Englewood Cliffs, N.J. : Prentice
Hall, 1992.
Gerard Salton. Automatic text processing: the transformation, analysis,
and retrieval of information by computer. Reading, Mass. : Addison-Wesley,
1988.
[Amazon currently lists this book as "Out of Print--Limited Availability",
but it may be available used.]
Charles P. Bourne and Trudi Bellardo Hahn.
A History of Online Information Services: 1963-1976.
The MIT Press, 2003; ISBN: 0-262-02538-8.
For those interested in the early history of online IR services.
Additional Readings:
These are readings for background and discussion. Most of these
will be assigned for class discussion, others are for those who
wish to dig further into particular subjects. The list is
based on the textbook of readings on IR by Peter Willett and Karen
Sparck Jones, with some additional items (in case you want to try to
hunt down the individual papers in the readings for the course).
- * HISTORICAL: These items cover some early ideas and implementations
that provide some of the foundations of information retrieval theory and
practice.
-
- LUHN57
- Luhn, H.P. (1957). A Statistical Approach to Mechanized Encoding and
Searching of Literary Information. IBM Journal of Research and Development,
1, 309-317.
- FAIR58
- Fairthorne, R.A. (1958). Automatic Retrieval of Recorded Information.
Computer Journal, 1, 36-41. (Also in Fairthorne, R.A. (1961). Towards
information retrieval. London: Butterworths).
- JOYC58
- Joyce, T. and Needham, R.M. (1958). The thesaurus approach to information
retrieval. American Documentation, 9 (3), 192-197.
- LUHN61
- Luhn, H.P. (1961). The automatic derivation of information retrieval
encodements from machine-readable texts. Information retrieval and machine
translation (Ed A. Kent), Vol 3, Pt 2, 1021-1028; reprinted in C.K.
Schultz, Ed, H.P. Luhn: Pioneer of information science, New York:
Spartan Books, 1968,
- MARO60
- Maron, M.E. and Kuhns, J.L. (1960). On relevance, probabilistic indexing
and information retrieval. Journal of the Association for Computing
Machinery, 7, 216-244.
- MARO61
- Maron, M.E. (1961). Automatic indexing: an experimental inquiry. Journal
of the Association for Computing Machinery, 8, 404-417.
- DOYL62
- Doyle, L.B. (1962). Indexing and abstracting by association. Part
1. SP-718/001/00, System Development Corporation, Santa Monica CA.
- MARO65
- Maron, M.E. (1965). Mechanised documentation: the logic behind a probabilistic
interpretation. Statistical methods for mechanised documentation
(Ed M.E. Stevens, V.E. Giuliano and L.B. Heilprin), National Bureau of
Standards Miscellaneous Publication 269, Washington DC: US Government Printing
Office, 9-13.
- CLEV67
- Cleverdon, C.W. (1967). The Cranfield tests on index language devices.
Aslib Proceedings, 19, 1967, 173-192.
- SALT68
- Salton, G, and Lesk, M.E. (1968). Computer evaluation of indexing
and text processing. Journal of the ACM, 15 (1), 8-36; reprinted
in G. Salton, Ed, The SMART retrieval system, Englewood Cliffs NJ:
Prentice-Hall, 1971, 143-180.
- * KEY CONCEPTS: These papers examine the nature of documents, aboutness,
indexing and index languages, requests, relevance, users and searching.
Note this section deals with these topics primarily in an analytical and
descriptive style, rather than by wholesale modelling of the retrieval
process, covered in a later section.
-
- HUTC78
- Hutchins, W.J. (1978). The concept of `aboutness' in subject indexing.
Aslib Proceedings, 30. 172-181.
- CLEV63
- Cleverdon, C.W. and Mills, J. (1963). The testing of index language
devices. Aslib Proceedings, 15 (4), 106-130; reprinted in L.M. Chan,
P.A. Richmond and E. Svenonius, Eds, Theory of Subject Analysis,
Littleton CO: Libraries Unlimited, 1986, 223-246.
- FOSK80
- Foskett, D.J. (1980). Thesaurus. in A. Kent. H. Lancour and J.E. Daily,
Eds, Encyclopedia of Library and Information Science, Vol 30, New
York: Marcel Dekker, 416-462; reprinted in E.D. Dym, Ed, Subject and
information analysis, New York: Marcel Dekker, 1985, 270-316.
- DANI85
- Daniels, P.J., Brooks, H.M. and Belkin, N.J. (1985). Using problem
structures for driving human-computer dialogues. RIAO-85, Actes: Recherche
d'Informations Assistee par Ordinateur, Grenoble: IMAG, 645-660.
- SARA75
- Saracevic, T. (1975). Relevance: a review of and a framework for the
thiniking on the notion in information science. Journal of the American
Society for Information Science, 39 (3) 321-343.
- * EVALUATION: These papers cover the notions of performance issues,
criteria for performance evaluation, test design and methodology, with
examples illustrating the methods.
-
- SARA88
- Saracevic, T. et al (1988). A study of information seeking and retrieving,
Parts 1,2,3. Journal of the American Society for Information Science,
39 (3), 161-216. Pt 1 only
- COOP73
- Cooper, W.S. (1973). On selecting a measure of retrieval effectiveness.
Pt 1. Journal of the American Society for Information Science, 24
(?2), 87-100.
- TAGU92
- Tague-Sutcliffe, J. (1992). The pragmatics of information retrieval
experimentation, revisited. Information Processing and Management,
28 (4), 467-490.
- KEEN92
- Keen, E.M. (1992). Presenting results if experimental retrieval comparisons.
Information Processing and Management, 28 (4), 491-502.
- LANC69
- Lancaster, W.F. (1969). MEDLARS: Report on the evaluation of its operating
efficiency. American Documentation, 20 (2), 119-142; reprinted in
T. Saracevic, Ed, Introduction to Information Science, New York:
Bowker, 1970, 640-664.
- BLAI85
- Blair, D.C. and Maron. M.E. (1985). An evaluation of retrieval effectiveness
for a full-text document retrieval system. Communications of the ACM,
28 (??), 289-299.
- SALT86
- Salton, G. (1986). Another look at text-retrieval systems. Communications
of the ACM, 29(7), 648-656.
- BLAI90
- Blair, D.C. and Maron, M.E. (1990). Full text information retrieval:
further analysis and clarification. Information Processing and Management,
26, 437-447.
- BLAI96
- Blair, D.C. (1996). STAIRS redux: thoughts on the STAIRS evaluations,
ten years after. Journal of the American Society for Information Science,
47, 4-22.
- HARM95
- Harman, D. (1995). The TREC Conferences. Hypertext - information
retrieval - multimedia: synergieeffekte elektronischer informationssysteme,
HIM '95, Proceedings (Ed R. Kuhlen and M. Rittberger), Konstanz: Universitaetsforlag
Konstanz, 9-28.
- * BASIC IR MODELS: These papers cover models of IR, both qualitative
and quantitative (eg cognitive, statistical), concentrating on the general
notions of the main IR models. Implementation issues are described later
in Techniques.
-
- ROBE77b
- Robertson, S.E. (1977). Theories and models in information retrieval.
Journal of Documentation, 33, 126-148.
- BELK82
- Belkin, N.J., Oddy, R.N. and Brooks, H.M. (1982). ASK for information
retrieval: part 1. Background and theory. Journal of Documentation,
38, 61-71.
- COOP88
- Cooper, W.S. (1988). Getting beyond Boole. Information Processing
and Management, 24, 243-248.
- ROBE77b
- Robertson, S.E. (1977). The probability ranking principle in IR. Journal
of Documentation, 33, 294-304.
- SALT75
- Salton, G. Wong, A. and Yang, C.S. (1975). A vector space model for
automatic indexing. Communications of the ACM, 18 (11), 613-620.
- ROBE82
- Robertson, S.E., Maron, M.E. and Cooper, W.S. (1982). Probability
of Relevance: A Unification of Two Competing Models for Document Retrieval.
Information Technology: Research and Development, 1, 1-21.
- TURT90
- Turtle, H.R. and Croft, W.B. (1990). Inference networks for document
retrieval. Proceedings of the 13th International Conference on Research
and Development in Information Retrieval, 1-24, 1990.
- VANR86
- van Rijsbergen, C.J. (1986). A non-classical logic for information
retrieval. Computer Journal, 29, 481-485, 1986.
- *IR TECHNIQUES: These papers examine the details of various models
and other specific techniques and technologies, including reports of testing.
-
- BELK87
- Belkin, N.J. and Croft, W.B. (1987). Retrieval Techniques. Annual
Review of Information Science and Technology, 22, 109-145.
- ROBE76
- Robertson, S.E. and Sparck Jones, K. (1976). Relevance Weighting of
Search Terms. Journal of the American Society for Information Science,
27(3), 129-146.
- CROF79
- Croft, W.B. and Harper, D.J. (1979). Using probabilistic models of
document retrieval without relevance information. Journal of Documentation,
35, 285-295.
- PORT80
- Porter, M.F. (1980). An algorithm for suffix stripping. Program,
14, 130-137.
- ROBE94
- Robertson, S.E. and Walker, S. (1994). Some simple effective approximations
to the 2 Poisson model for probabilistic weighted retrieval. SIGIR 94
- Proceedings of the Seventeenth Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval, 232-241.
- SALT88
- Salton, G. and Buckley, C. (1988). Term weighting approaches in automatic
text retrieval. Information Processing and Management, 24, 513-523.
- SALT90
- Salton, G. and Buckley, C. (1990). Improving retrieval performance
by relevance feedback, Journal of the American Society for Information
Science, 41, 288-297, 1990.
- SPAR79
- Sparck Jones, K. (1979). Search term relevance weighting given little
relevance information. Journal of Documentation, 35 (1), 30-48.
- STRZ94
- Strzalkowski, T. (1994) Robust text processing in automated information
retrieval. Proceedings of the 4th Conference on Applied Natural Language
Processing (stuttgart), Association for Computational Lingustics, 168-173.
- GRIF86
- Griffiths, A., Luckhurst, H.C. and Willett, P. (1986). Using interdocument
similarity information in document retrieval systems. Journal of the
American Society for Information Science, 37 (1), 3-11.
- BELK92
- Belkin, N.J. and Croft, W.B. (1992). Information filtering and information
retrieval: two sides of the same coin? Communications of the ACM,
35(12), 29-38.
- * SYSTEMS: This section includes papers describing complete IR
systems, focussing on those embodying modern views of what such systems
should be like, but also illustrating the status of more `conventional'
systems.
-
- SALT83
- Salton, G. and McGill, M.J. (1983). The SMART and SIRE experimental
retrieval systems. In Introduction To Information Retrieval, New
York, McGraw-Hill, pp 118-156.
- HARM92
- Harman, D. (1992). User-friendly systems instead of user-friendly
front-ends. Journal of the American Society for Information Science,
43 (?), 164-174.
- WALK89
- Walker, S. (1989). The Okapi online catalogue research projects. in
The online catalogue: developments and directions (Ed C. Hildreth),
London: The Library Association, 84-106.
- CALL95
- Callan, J.; Croft, W.B. and Broglio, J. (1995). TREC and TIPSTER experiments
with INQUERY. Information Processing and Management, 31 (3).
- FOX87
- Fox, E.A. and France, R.K. (1987). Architecture of an expert system
for composite document analysis, representation and retrieval. Journal
of Approximate Reasoning, 1, 151-175.
- FOX88
- Fox, E.A. and Koll, M.B. (1988). Practical enhanced Boolean retrieval:
experiences with the SMART and SIRE systems. Information Processing
and Management, 24, 257-267.
- MCCU85
- McCune, B.P., Tong, R. and Dean, J. (1985). RUBRIC, a system for rule-based
information retrieval. IEEE Transactions on Software Engineering. SE11-9,
939-944.
- JACO90
- Jacobs, P.S. and Rau, L.F. (1990). SCISOR: extracting information
from on-line news. Communications of the ACM, 33(11), 88-97.
- LARS96
- Larson, R.R., McDonough, J., Kuntz, L., O'Leary, P. and Moon, R. ``Cheshire
II: Designing a Next-Generation Online Catalog.'' Journal of the American
Society for Information Science, 47(7) (July 1996), p. 555-567.
- TENO94
- Tenopir, C. and Cahn, P. (1994). TARGET and FREESTYLE: DIALOG and
Mead join the relevance ranks. Online, 18 (3), 31-47. (shorter after
ads deleted)
- * EXTENSIONS: These papers move outwards from the classical text
document/single query situation to consider other types of `document' and
other versions and aspects of the information access task. The object is
to illustrate the scope of information retrieval viewed more broadly, and
to draw attention to the links between retrieval and other information
processing activities. At the same time, since some of the ideas and work
covered here also reflect new challenges and possibilities stemming from
recent technology developments, this section has papers to be taken as
initial leads into the future, rather than as authoritative guides to the
established wisdom.
-
- LARS88
- ``Hypertext and Information Retrieval: Towards the Next Generation
of Information Systems''. In: Borgman, C. L. and Pai, E. Y. H. (Eds.) Information
and Technology: Proceedings of the 51st ASIS Annual Meeting, Medford,
NJ: Learned Information, Inc., 1988.
- AGOS92
- Agosti, M. Gradenigo, G. and Marchetti, P.G. (1992). A hypertext environment
for interacting with large databases Information Processing and Management,
28 93), 371-387.
- SALT94
- Salton, G., Allan, J., Buckley, C. and Singhal, A. (1994). Automatic
analysis, theme generation, and summarisation of machine-readable texts.
Science, 264, 3 June, 1421-1426.
- HULL96
- Hull, D.A. and Grefenstette, G. (1996). Experiments in multilingual
retrieval. Proceedings of the 19th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval.
- ROSE91
- Rose, R.C. (1991). Techniques for information retrieval from speech
messages. Lincoln Laboratory Journal, 4 (1), 45-59.
- ZHAN95
- Zhang, H.J., Low, C.Y., Smoliar, S.W. and Wu, J.H. (1995). Video parsing,
retrieval and browsing: an integrated and content-based solution. Proceedings
of ACM Multimedia '95, 15-24; reprinted in Intelligent multimedia
information retrieval (Ed M. Maybury).
- BIEB88
- Biebricher, B. et al (1988). The automatic indexing system AIR/PHYS
- from research to application. Eleventh International Conference on
Research and Development in Information Retrieval, 333-342.
- HAYE88
- Hayes, P.J., Knecht, L. and Cellio, M. (1988). A news story categorisation
system. Proceedings of the Second Conference on Applied Natural Language
Processing, Association for Computational Linguistics, 9-17.
- RAU88
- Rau, L.F. (1988). Conceptual information extraction and retrieval
from natural language input. RIAO 88, 424-437.
- MARS84
- Marsh, E., Hamburger, H. and Grishman, R. (1984). A production rule
system for message summarisation. AAAI-84, Proceedings, American
Association for Artificial Intelligence, 243-246.
- JOHN93
- Johnson, F.C., Paice, C.D., Black, W.J. and Neal, A.P. (1993). The
application of linguistic processing to automatic abstract generation.
Journal of Document and Text Management, 1 (3), 215-241.
- * ADDITIONAL ITEMS
-
- SWAN88
- Swanson, D.R. (1988). Historical note: information retrieval and the
future of an illusion. Journal of the American Society for Information
Science, 39 (2), 92-98.
- SING96
- Singhal, A., Buckley, C. and Mitra, M. (1996). Pivoted Document Length Normalization. In SIGIR '96, pp. 21-29.
- RAGH86
- Raghavan, V.V. and Wong S.K.M. (1986). A Critical Analysis of the
Vector Space Model for Information Retrieval. Journal of the American
Society for Information Science. 37(5), pp. 279-287.
- SALT91
- Salton, G. (1991). Developments in Automatic Text Retrieval. Science, 253
(30 Aug 1991), pp. 974-980.
- COOP92
-
Cooper, W.S., Gey, F.C. and Dabney, D.P. (1992). Probabilistic Retrieval Based
on Staged Logistic Regression. In: SIGIR '92, pp. 198-210.
- PONT98
- Ponte, J.M. and Croft W.B. (1998). A Language Modelling Approach to
Information Retrieval. In: SIGIR '98, pp. 275-281.