Click here to start text-only
Search and Retrieval: More onTerm Weighting and Document Ranking
Today
Finding Out About
Ranking Algorithms
PPT Slide
Vector Representation (revisited; see Salton article in Science)
Assigning Weights to Terms
Assigning Weights
tf x idf
tf x idf normalization
Vector space similarity(use the weights to compare the documents)
Vector Space Similarity Measurecombine tf x idf into a similarity measure
To Think About
Computing Similarity Scores
Computing a similarity score
Other Major Ranking Schemes
Staged Logistic Regression
Multi-Dimensional Space
Text Clustering
Pair-wise Document Similarity
Pair-wise Document Similarity(no normalization for simplicity)
Using Clustering
Clustering
Clustering Multi-Dimensional Document Space(image from Wise et al 95)
Concept “Landscapes” from Kohonen Feature Maps (X. Lin and H. Chen)
Graphical Depictions of Clusters
Another Approach to Term Weighting:Latent Semantic Indexing
Document/Term Matrix
Finding Similar Tokens
Email: hearst@sims.berkeley.edu
Home Page: http://sims.berkeley.edu/~hearst
Download presentation source