Document Space has High Dimensionality

What happens beyond three dimensions?

Similarity still has to do with how many tokens are shared in common.

More terms -> harder to understand which subsets of words are shared among similar documents.

One approach to handling high dimensionality:

Clustering

Previous slide Next slide Back to first slide View graphic version