202 search tool now available

Hello everybody,

Our promised 202 search tool is now available on bit.ly/202search

You can enter a query and the tool will perform 4 different searches against the corpus of 202 lecture slides. The search algorithms are:

  1. Standard vector search (more relevance to most frequent words)
  2. Normalized (tf-idf) vector search (penalizes terms that occur in a lot of documents)
  3. LSA search with 1 dimension reduced
  4. LSA search with 3 dimensions reduced (these last two algorithms are the most interesting ones because the documents not necessarily will contain the terms of the query but will contain terms that tend to appear together with the query terms)

Please be patient. The processing may take around one minute since it's creating a document matrix and performing a lot of calculations (removing stop words, stemming, tf, idf, singular value decomposition, etc.) on the fly. But hopefully it's going to provide you with the precision and recall you want!

We divided each lecture into 4 or 5 different topics. We wanted more granular documents to benefit more from LSA recommendations. This should also lead to more precise results. Once you click on the link you want, you will only need to scroll through 5 or 6 slides instead of 30.

If you find any bugs, please let us know so we can improve the tool.

Good luck in the final exam,

Karen, Satish and Julián

PS Some examples of queries you can try:

  • vocabulary control
  • weinberger
  • faceted classification
  • semantic analysis