L21. VECTOR MODELS (11/9)

9 November 2009

The Boolean model represents documents as a set of index terms that are either present or absent. This binary notion doesn't fit our intuition that terms differ in how much they suggest what the document is about.  Vector models capture this notion by representing documents and queries as word or term vectors and assigning weights that can capture term counts within a document or the importance of the term in discriminating the document in the collection.  Vector algebra provides a model for computing similarity between queries and documents and between documents because of assumption that "closeness in space" means "closeness in meaning".

(Don't worry if you haven't thought about vectors in a long time. We'll review everything you need to know to understand how they work... and you'll get to practice your understanding with an assignment).

 

Download recorded lecture: Part1: http://courses.ischool.berkeley.edu/i202/f09/files/202-20091109-part1.mp3

Part2: http://courses.ischool.berkeley.edu/i202/f09/files/202-20091109-part2.mp3