Review
Content Analysis:
- Transformation of raw text into more computationally useful forms
Words in text collections exhibit interesting statistical properties
- Zipf distribution
- Word co-occurrences non-independent
Text documents are transformed to vectors
- Pre-processing
- Vectors represent multi-dimensional space