SIMS 290-2: Applied Natural Language Processing

Fall 2004, Prof. Marti Hearst

Course Information

Overview

Administrivia

Lecture Schedule

Assignments

Books, Software, & Other Resources

Assignments

Final Project Description

Final Project Class Presentations

Monday Dec 6:

Eva Mok:
Semantic Analysis of Child-Directed Mandarin Chinese using Construction Grammar
Christine Hodges:
Preliminary exploration of applications of the FrameNet database for question answering
Preslav Nakov:
Noun Compound Bracketing
Roger Bock:
Acronym Recognition and Disambiguation
Dan Perkel and Ryan Shaw:
Clustering community reviews of Internet Archive content
Andrea La Pietra, Sarah Poon, and Hong Qu:
NLP Analysis of Linguistic Features of Popular Blogs

Wednesday Dec 8:

Jeff Heer:
Social Network Analysis of the Enron Data Set

Kavita Mittal and Annie Yeh:
Recipe Back-of-the-Book Indexer

Brooke Maury and Vijay Viswanathan:
Recording Artist Community Metadata

Simon King and Jeff Towle:
Improving Automated Medata Data Hierarchy Generation

Yongwook Jeong:
Comparing WordNet Similarity Measures

Andrew Fiore:
Analysis of the Enron Corpus via Clustering

Murali Rangan:
Qualifying social relations in Enron Data Set

Assignment 4

Assignment 4: Enron Email Corpus

Assignment 4 Sample Projects:

Roger Bok: Improving the acronym definition recognizer. pdf
Murali Rangan: Acronym definition recognition according to different search terms. doc
Christine Hodges and Andrea La Pietra: Analyzing an NER and visualizing the resulting networks. doc
Eva Mok: Mapping names to email addresses and doing network analysis. pdf and text file with name mappings
Jeff Heer: Initial processing for social network analysis. doc
Sarah Poon and Hong Qu: Analyzing assertion of political influence via NER. doc

Assignment 3

Assignment 3: Text Classification

Assignment 2

Assignment 2 Sample Solutions:

Andrew Fiore writeup (doc) code (py)
Roger Bock (used rules other than ChunkRule) writeup (pdf) code (py)
Ryan Shaw writeup (txt) code (py)

Choose a text collection (one provided by NLTK, or any other you may want to use; in the latter case you should run a POS tagger over it first). Choose a verb that interests you, and find all the sentences that are tagged with that verb (probably best to use all of its conjugated forms). Be sure you have a good size number of sentences containing the verb in its various forms (at least 30).
Using the NLTK shallow parsing facility (chunk, unchunk, and chink rules along with RegexpChunkParser), produce shallow parsers of the selected sentences. You may want to start with the ones that I presented in class, but you should improve on them greatly. I suggest using multiple rules for each type of chunk (NP, VP, PP, others if you like). When you turn in the assignment you should show some before and after parses on the same sentences to show how much your rule rewrites have improved the chunker.
Analyze the argument structure of the verb that you've chosen. What kinds of subject and objects does it tend to take, both syntactically and semantically?
(Optional) Now try to find at least two verbs that take objects or subjects that are similar in form, either syntactically, semantically, or both. If you can't find any, try to describe why not. Be sure to describe how you tried to find the similar verbs.
You may want to use the functions that I discussed in class: chunking.py
Optional paper that has some good ideas:
VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations, Chklovski & Pantel. EMNLP 2004. pdf

To turn in:

Samples of before-and-after parses of sentences using your improved rules compared to those I've provided, and a description of your regexps.
Your code in one or more files.
A description of the characteristics of the context surrounding the verb, answering the questions posed above.
(Optional) A description of the verbs similar to this one that found (or if you couldn't find them, say why not), and how you did this analysis.

Due Wed Sept 29 at 8pm.

Assignment 1

Exercises 1-3 from the tokenizing tutorial, and 1a-h, 2, 3, 4, 5a-b from the Tagging Tutorial. Due Wed Sep 15 at 8pm.
Preslav Nakov and Barbara Rosario: Suggested solutions for A1; Suggested Code