SIMS 290-2: Applied Natural Language Processing

   Fall 2004, Prof. Marti Hearst

Course Information

Schedule

    Note: Times and Topics Subject to Change

 

Date

Lecture Topics / Assignments

Reading

1 Aug 30 Course Introduction Start learning Python
Sep 1 Using Large Collections; Getting started with NLTK NLTK Tutorial 1
2 Sep 6 Holiday: no class Charming Python (NLTK Intro. Note: some details out of date)
Sep 8 Tokenization, Morphological Analysis
Assignment (due Sep 15): Tokenizing Exercises 1-3
NLTK Tokenizing Tutorial
3 Sep 13 Part-of-Speech Tagging
Assignment (due Sep 15): Tagging Exercises 1a-h, 2, 3, 4, 5a-b
NLTK Tagging Tutorial
Sep 15 Conditional Probabilities and Transformation-based Tagging
4 Sep 20 Shallow Parsing NLTK Chunking Tutorial (html)   (pdf)
Sep 22 Chunking, continued
Assignment 2 assigned
5 Sep 27 Text Classification: Introduction NLTK Tutorial 5 1st 1/2
Paper: Adaptive Multilingual Sentence Boundary Disambiguation
Sep 29 Text Classification: Feature Selection
Assignment 2 due
6 Oct 4 Text Classification: Algorithms
Assignment 3 Assigned
Oct 6 Text Classification: Weka
7 Oct 11 Information Extraction (Reading Optional) Introduction to IE Technology by Appelt and Israel, IJCAI 1999. (pdf)
Oct 13 Information Extraction: ML Techniques (Optional Reading) Overview of CoNLL-2003
8 Oct 18 Email and anti-spam analysis
Assignment 3 due Oct 18
Oct 20 Text Data Mining (Optional) Untangling Text Data Mining
9 Oct 25 Lexicons and Ontologies
Oct 27 Guest Lecture: Charles Fillmore, Framenet (Background)The Berkeley FrameNet Project
10 Nov 1 In-class work on The Enron Email Collection
Assignment 4 assigned
  • Salon article
  • Explaining the Enron Bankrutpcy at CNN.com
  • Houston Chronicle News Report on the latest in Ken Lay's trial
  • Nov 3 In-class work
  • Enron email DB statistics
  • employee roles (xls)
  • Enron Chronology
  • 11 Nov 8 Using Very Large Corpora / Spelling Correction / Clustering
  • Banko & Brill'01
  • Cucerzan & Brill '04
  • Lapata & Keller '04
  • Nov 10 Guest lecture: Drago Radev on Text Summarization
    12 Nov 15 Question Answering
  • An Analysis of the AskMSR Question-Answering System by Brill, Dumais, and Banko'02 .
  • High performance question/answering Pasca and Harabagiu'01
  • Nov 17 Question Answering

    Assignment 4 due Nov 19
  • (Optional): Performance Issues and Error Analysis in an Open-Domain Question Answering System Moldovan, Pasca, Harabagui, Surdeanu '03
  • 13 Nov 22 Guest Lecture: Allison Woodruff and Paul Aoki on Designing Audio Systems for Social Talk
    Nov 24 Discuss Project Ideas
    14 Nov 29 Machine Translation (slides by Kevin Knight) Church, Hovy Good Applications for Crummy Machine Translation
    (Optional) Marcu, Wong, A phrase-based, joint probability model for statistical machine translation EMNLP'02
    Dec 1 Discourse Processing, Text Segmentation Hearst, Multi-Paragraph Segmentation of Expository Text , ACL'04. Sporleder and Lapata, Automatic Paragraph Identification: A Study across Languages and Domains EMNLP'04.
    15 Dec 6 Class Presentations
    Dec 8 Class Presentations