SIMS 296a-4:
Text Data Mining
Schedule and Lectures


 

Overview

 

Lectures and Readings

Go directly to the current week.

We are using the following text for background: Foundations of Statistical Natural Language Processing by Manning and Schuetze, MIT Press 1999.

August 23:

August 31:

    Reading: Textbook Chapter 3 (Intro to Linguistics)
    Discussion Leader: Ronald Sprouse, Intro to Linguistics
    Discussion Leader: Hao Chen, Current LINDI prototype

September 6

    Labor Day: No class

September 13

    Reading: Textbook Chapter 2 (Intro to Probability Theory and Information Theory)
    Reading: LINDI documents
    Discussion Leader: Barbara Rosario, Mathematical Foundations

September 20

    Reading: Textbook Chapter 16 (Text Categorization)
    Homework: Investigate a project text domain
    Discussion Leader: Hao Chen

September 27

    Reading: No reading
    Homework: Investigate examples within a domain.
    Discussion Leader: group work

October 4

    Readings:
    • Textbook Chapter 5: Collocations
    • Allen article introducing mixed-initiative interactions (from IEEE Intelligent Systems, Trends & Controversies, to appear. Just read the first of these three unless you want to read more.)
    • Horvitz article on mixed-initiative interaction (from ACM CHI 99).
    • Hearst paper on computing directionality within senteces (From Text-based Intelligent Systems, edited by Paul Jacobs, Lawrence Erlbaum Associates, 1992.)


    Homework: none
    Discussion Leader: Andy Dolbey

October 11

    Readings:
    • Textbook Chapter 6: Statistical Inference
    • Plus three papers -- get hardcopies outside my office (212 South Hall)
      • Rajeev Agarwal and Lois Boggess, "A Simple but Useful Approach to Conjunct Identification", Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, 15-21, 1992
      • Lois Boggess and Rajeev Agarwal and Ron Davis, "Disambiguation of Prepositional Phrases in Automatically Labelled Technical Text", in Proceedings of AAAI 91, 155-159, 1991.
      • Sanda M. Harabagiu and Dan I. Moldovan "Knowledge processing on an extended WordNet", WordNet: An Electronic Lexical Database, Christiane Fellbaum (ed.), MIT Press, 1998.


    Homework: Each person has a task; we are working with the MESH Scope Notes. See http://www.nlm.nih.gov/mesh/99MBrowser.html

    Discussion Leader: Jonathan Henke Ngrams over sparse data

October 18

October 25

November 1

    Readings: Textbook Chapter 9: HMMs.
    Discussion Leader: David Blei
    Homework: Each person has something to do.
November 8

    Readings: Textbook Chapter 10: Part of Speech tagging.
    Discussion Leader: Barbara Rosario
    Homework: Each person has something to do.
November 15

November 22

    Homework: Writeup goals for end of semester; get work done on those goals.