|
Date |
Lecture Topics / Assignments |
Reading |
| | | |
1 | Aug 30 | Course Introduction |
Start learning Python
|
| Sep 1 | Using Large
Collections; Getting started with NLTK
|
NLTK Tutorial 1
|
| | | |
2 | Sep 6 | Holiday: no class |
Charming Python (NLTK Intro. Note: some details out of date)
|
| Sep 8 | Tokenization, Morphological
Analysis
Assignment (due Sep 15): Tokenizing Exercises 1-3
|
NLTK Tokenizing Tutorial
|
| | | |
3 | Sep 13 | Part-of-Speech Tagging
Assignment (due Sep 15):
Tagging Exercises 1a-h, 2, 3, 4, 5a-b
|
NLTK Tagging Tutorial
|
| Sep 15 |
Conditional Probabilities and Transformation-based Tagging
|
|
| | | |
| | | |
4 | Sep 20 | Shallow Parsing |
NLTK Chunking Tutorial (html)
(pdf)
|
| Sep 22 | Chunking,
continued
Assignment 2 assigned
|
|
| | | |
5 | Sep 27 | Text Classification: Introduction |
NLTK Tutorial 5 1st 1/2
Paper:
Adaptive Multilingual Sentence Boundary Disambiguation
|
| Sep 29 | Text Classification: Feature Selection
Assignment 2 due
| |
| | | |
6 | Oct 4 | Text
Classification: Algorithms
Assignment 3
Assigned
|
|
| Oct 6 | Text Classification: Weka | |
| | | |
| | | |
7 | Oct 11 | Information Extraction |
(Reading Optional)
Introduction to IE Technology by Appelt and Israel, IJCAI 1999. (pdf)
|
| Oct 13 | Information
Extraction: ML Techniques |
(Optional Reading) Overview of CoNLL-2003
|
| | | |
8 | Oct 18 | Email and
anti-spam analysis
Assignment 3 due Oct 18
| |
| Oct 20 | Text Data
Mining |
(Optional) Untangling Text Data Mining
|
| | | |
9 | Oct 25 | Lexicons and Ontologies | |
| Oct 27 | Guest Lecture:
Charles Fillmore, Framenet |
(Background)The Berkeley FrameNet Project
|
| | | |
| | | |
10 | Nov 1 | In-class work
on The Enron Email Collection
Assignment 4 assigned
|
Salon article
Explaining the Enron Bankrutpcy at CNN.com
Houston
Chronicle News Report on the latest in Ken Lay's trial
|
| Nov 3 | In-class work |
Enron email DB
statistics
employee roles (xls)
Enron Chronology
|
|
| | |
11 | Nov 8 | Using Very Large Corpora / Spelling
Correction / Clustering |
Banko & Brill'01
Cucerzan &
Brill '04
Lapata &
Keller '04
|
| Nov 10 | Guest lecture: Drago Radev on Text Summarization | |
| | | |
12 | Nov 15 | Question Answering
|
An Analysis
of the AskMSR Question-Answering System by Brill, Dumais, and Banko'02 .
High performance question/answering
Pasca and Harabagiu'01
|
| Nov 17 | Question Answering
Assignment 4 due Nov 19
|
(Optional):
Performance Issues and Error Analysis in an Open-Domain Question Answering System
Moldovan, Pasca, Harabagui, Surdeanu '03
|
| | | |
| | | |
13 | Nov 22 | Guest Lecture: Allison Woodruff and Paul
Aoki on Designing Audio Systems for Social Talk |
|
| Nov 24 | Discuss Project Ideas
|
|
| | | |
14 | Nov 29 | Machine Translation (slides by Kevin Knight) |
Church, Hovy
Good Applications for Crummy Machine Translation
(Optional)
Marcu, Wong,
A phrase-based, joint probability model for statistical machine translation
EMNLP'02
|
| Dec 1 | Discourse Processing, Text Segmentation |
Hearst,
Multi-Paragraph Segmentation of Expository Text
, ACL'04.
Sporleder and Lapata,
Automatic Paragraph Identification: A Study across
Languages and Domains
EMNLP'04.
|
| | | |
15 | Dec 6 | Class Presentations | |
| Dec 8 | Class Presentations | |
| | | |
| | | |