|
Date |
Lecture Topics / Assignments |
Readings (due on date shown) |
| | | |
1 | Aug 28 |
Course Introduction |
No reading. |
| Aug 30 |
Introductions; Python Intro |
Activity:Install Python and NLTK-lite
Chapter 1
from Jurafsky and Martin
Python
Programming Fundamentals (NLTK-lite tutorial, Sections 2.1 - 2.4)
|
| | | |
2 | Sep 4 | Holiday: no class |
Do reading for Sept 6.
|
|
Sep 6 |
Tokenization, Regular Expressions
Assignment 1: Tokenization
|
Python
Programming Fundamentals (NLTK-lite tutorial, Sections
2.5-2.8)
Regular
Expressions, NLTK-lite tutorial
|
| | | |
3 | Sep 11 |
Morphology and Stemming
|
Wikipedia entry on Morphology
Words, Sections 3.1 - 3.3 and 3.5, NLTK-lite tutorial
|
| Sep 13 |
Computing with Ngrams
|
Words, Section 3.4 NLTK-lite tutorial
|
| | | |
| | | |
4 | Sep 18 |
POS Tagging |
Tagging,
NLTK-lite Tutorial, Sections 4.0-4.4
|
| Sep 20 |
POS Tagging with n-grams
Assignment 2 assigned
|
Tagging,
NLTK-lite Tutorial, Sections 4.4-4.7
|
|
| | | |
5 | Sep 25 |
Shallow Parsing
|
Chunk
Parsing, NLTK-lite Tutorial, Sections 5.0-5.3
|
| Sep 27 |
Shallow Parsing, cont.
(code for lecture)
|
Chunk
Parsing, NLTK-lite Tutorial, Sections 5.4-5.6
H.P. Luhn,
The automatic creation of literature abstracts,
IBM Journal of R&D, 2(2), 1958.
|
| | | |
6 | Oct 2 |
Summarization
|
H.P. Edmonson, New methods in automatic
extracting, JACM, 16(2), 1969.
J. Kupiec, J. Pedersen, F. Chen, A trainable document summarizer,
Proc. of SIGIR, 1995.
|
| Oct 4 |
Summarization; Intro to Probability Theory
Assignment 3 assigned
|
D. Marcu,
Discourse trees are good indicators of importance in
text, in Advances in Automatic Text Summarization, 1999.
J. Goldstein, V. Mittal, J. Carbonell, M. Kantrowitz,
Multi-Document
Summarization by Sentence Extraction, ANLP/NAACL Workshop, 2000.
|
| | | |
| | | |
7 | Oct 9 |
Probabilities, cont.; Author Identification
|
Can Pseudonymity Really Guarantee Privacy? by Rao and Rohatgi, in
9th USENIX Security Symposium, 2000
|
| Oct 11 |
Guest lecture: Elizabeth Charnock and Steve Roberts of Cataphora
|
|
| | | |
8 | Oct 16 |
Text Classification Intro (Guest Lecture: Preslav Nakov)
|
Machine Learning in Automated Text Categorization, Sebastiani,
ACM Computing Surveys 34 (1), 2002. Sections 1-4.
(Note: this reading is optional.)
|
| Oct 18 |
Summarization experiment; Class project ideas
|
A comparative study on Feature
Selection in Text Categorization, Yang and Pedersen, Proc. of
ICML, 1997.
Sebastiani Survey, Section 5 (optional)
|
| | | |
9 | Oct 23 |
Text Classification: Feature Selection
Project Proposal assigned;
due Oct 30
|
Sebastiani Survey, Sections
6.3, 6.4, 6.8, 6.9, 6.10. (optional)
|
| Oct 25 |
Guest lecture: Peter Jackson,
Chief Research Scientist and VP, Technology
Thomson Legal & Regulatory
|
|
| | | |
| | | |
10 | Oct 30 |
Text Classification: Using Weka
Assignment 4 assigned; due Nov 13
|
Sebastiani Survey, Sections
7.2, 7.3 (optional)
Weka Simple Experiments Documentation
Weka
Explorer Documentation
|
| Nov 1 |
Text Classification: Algorithms
|
Weka
Advanced Experiments Documentation
|
|
| | |
11 | Nov 6 |
Clustering, LSA
|
(Optional)
An introduction to LSA
(found by Hannes)
|
| Nov 8 |
More cluster examples; Blog Analysis
|
Predicting Movie Sales from Blogger Sentiment,
Mishne & Glance, AAAI-CAAW 2006.
Deriving Marketing Intelligence from Online Discussion,
Glance et al., KDD'05 (optional)
|
| | | |
12 | Nov 13 |
Lexicon Acquisition
|
Extracting Product Features and Opinions from Reviews,
Popsecu & Etzioni, HLT/EMNLP 2005.
Towards a Robust Metric of Opinion, Nigam & Hurst,
AAAI-EAAT 2004.
|
| Nov 15 |
Information Extraction
|
|
| | | |
| | | |
13 | Nov 20 |
Guest lecture: Barbara Rosario, Intel Research
Finding Semantic Relations
|
|
| Nov 22 |
Guest lecture: Roger Magoulas, O'Reilly Media, Text Mining at
O'Reilly Media
|
|
| | | |
14 | Nov 27 |
Discourse Processing
Project
Writeup and Presentation Schedule
|
|
| Nov 29 |
Question Answering
|
| | | |
15 | Dec 4 | Class Presentations | |
| Dec 6 | Class Presentations | |
| | | |
| | | |