Buffalo

The sentence “Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo” is a grammatically and semantically valid sentence in English and a great example of the challenges homophony presents for IR. Although a search engine would index this as 8 instances of the same word, there are actually three variations of “buffalo”:

  • the city Buffalo, NY
  • the animal buffalo
  • the verb buffalo (meaning “to bully”)

Parsed, the sentence reads: “THE buffalo FROM Buffalo WHO ARE buffaloed BY buffalo FROM Buffalo, buffalo (verb) OTHER buffalo FROM Buffalo.” I will be amazed if an NLP processor is ever able to understand this considering I couldn’t figure it out without the Wikipedia article.

http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo