Document Space has High Dimensionality
What happens beyond three dimensions?
Similarity still has to do with how many tokens are shared in common.
More terms -> harder to understand which subsets of words are shared among similar documents.
One approach to handling high dimensionality: