Search and Retrieval: More on Term Weighting and Document Ranking

11/11/97


Click here to start

Click here to start text-only


Table of Contents

Search and Retrieval: More on Term Weighting and Document Ranking

Today

Finding Out About

Ranking Algorithms

PPT Slide

Vector Representation (revisited; see Salton article in Science)

Assigning Weights to Terms

Assigning Weights to Terms

Assigning Weights

tf x idf

tf x idf normalization

Vector space similarity (use the weights to compare the documents)

Vector Space Similarity Measure combine tf x idf into a similarity measure

To Think About

Computing Similarity Scores

Computing a similarity score

Other Major Ranking Schemes

Other Major Ranking Schemes

Staged Logistic Regression

Multi-Dimensional Space

Text Clustering

Text Clustering

Pair-wise Document Similarity

Pair-wise Document Similarity (no normalization for simplicity)

Using Clustering

Using Clustering

Clustering

Using Clustering

Clustering Multi-Dimensional Document Space (image from Wise et al 95)

Clustering Multi-Dimensional Document Space (image from Wise et al 95)

Concept “Landscapes” from Kohonen Feature Maps (X. Lin and H. Chen)

Graphical Depictions of Clusters

Another Approach to Term Weighting: Latent Semantic Indexing

Document/Term Matrix

Finding Similar Tokens

Document/Term Matrix

Author: hearst

Email: hearst@sims.berkeley.edu

Home Page: http://sims.berkeley.edu/~hearst

Download presentation source