Table of Contents
Current Topics in Information Access:IR Background
Last Time
Today
Some IR History
Information Retrieval
Structure of an IR System
PPT Slide
PPT Slide
PPT Slide
PPT Slide
Steps in a “typical IR System”
Stemming and Morphological Analysis
Automated Methods
Errors Generated by Porter Stemmer (Krovetz 93)
Query Languages
Simple query language: Boolean
Boolean Queries
Boolean Queries
Boolean Queries
Boolean Logic
Boolean Searching
Boolean Problems
Advantages and Disadvantage of the Boolean Model
Psuedo-Boolean Queries
Boolean Extensions
Ranking Algorithms
PPT Slide
Indexing and Representation:The Vector Space Model
Document RepresentationWhat values to use for terms
Document Vectors
Vector Representation
Document Vectors
Assigning Weights
Assigning Weights
tf x idf
tf x idf normalization
Vector Space Similarity Measurecombine tf x idf into a similarity measure
Computing Similarity Scores
Documents in Vector Space
Computing a similarity score
Similarity Measures
Problems with Vector Space
Probabilistic Models
Probabilistic Retrieval
Probabilistic Models: Some Notation
Probabilistic Models: Logistic Regression
Probabilistic Models: Logistic Regression attributes
Probabilistic Models: Logistic Regression
Probabilistic Models
Vector and Probabilistic Models
Simple Presentation of Results
Problems with Vector Space
Evaluation
What to Evaluate?
What to Evaluate?
Relevance
Standard IR Evaluation
Precision/Recall Curves
Precision/Recall Curves
Precision/Recall Curves
Document Cutoff Levels
The E-Measure
TREC
Sample TREC queries (topics)
TREC
TREC Results
Blair and Maron 1985
Blair and Maron, cont.
Blair and Maron, cont.
PPT Slide
Creating a Keyword Index
Inverted files
Inverted Files
How Are Inverted Files Created
How Inverted Files are Created
How Inverted Files are Created
How Inverted Files are Created
An Example IR System
Next Time
|