Final Project Class Presentations
Monday Dec 6:
Wednesday Dec 8:
Semantic Analysis of Child-Directed Mandarin Chinese using Construction Grammar
Preliminary exploration of applications of the FrameNet
database for question answering
Noun Compound Bracketing
Acronym Recognition and Disambiguation
Dan Perkel and Ryan Shaw:
Clustering community reviews of Internet Archive content
Andrea La Pietra, Sarah Poon, and Hong Qu:
NLP Analysis of Linguistic Features of Popular Blogs
Social Network Analysis of the Enron Data Set
- Kavita Mittal and Annie Yeh:
Recipe Back-of-the-Book Indexer
- Brooke Maury and Vijay Viswanathan:
Recording Artist Community Metadata
- Simon King and Jeff Towle:
Improving Automated Medata Data Hierarchy Generation
- Yongwook Jeong:
Comparing WordNet Similarity Measures
- Andrew Fiore:
Analysis of the Enron Corpus via Clustering
- Murali Rangan:
Qualifying social relations in Enron Data Set
Assignment 4 Sample Projects:
- Roger Bok: Improving the acronym definition recognizer.
- Murali Rangan: Acronym definition recognition according to
different search terms.
- Christine Hodges and Andrea La Pietra: Analyzing an NER and visualizing the
- Eva Mok: Mapping names to email addresses and doing network analysis.
text file with name mappings
- Jeff Heer: Initial processing for social network analysis.
- Sarah Poon and Hong Qu: Analyzing assertion of political influence
Assignment 2 Sample Solutions:
Choose a text collection (one provided by NLTK, or any other you may want to
use; in the latter case you should run a POS tagger over it first).
Choose a verb that interests you, and find all the sentences that are tagged
with that verb (probably best to use all of its conjugated forms). Be sure you
have a good size number of sentences containing the verb in its various forms
(at least 30).
Using the NLTK shallow parsing facility (chunk, unchunk, and chink rules
along with RegexpChunkParser), produce shallow parsers of the selected
sentences. You may want to start with the ones that I presented in class, but
you should improve on them greatly. I suggest using multiple rules for each
type of chunk (NP, VP, PP, others if you like). When you turn in the assignment
you should show some before and after parses on the same sentences to show how
much your rule rewrites have improved the chunker.
Analyze the argument structure of the verb that you've chosen. What kinds of
subject and objects does it tend to take, both syntactically and semantically?
(Optional) Now try to find at least two verbs that take objects or subjects that are similar in
form, either syntactically, semantically, or both. If you can't find any, try
to describe why not. Be sure to describe how you tried to find the similar verbs.
You may want to use the functions that I discussed in class: chunking.py
Optional paper that has some good ideas:
VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations,
Chklovski & Pantel. EMNLP 2004.
To turn in:
Due Wed Sept 29 at 8pm.
- Samples of before-and-after parses of sentences using your improved rules
compared to those I've provided, and a description of your regexps.
- Your code in one or more files.
- A description of the characteristics of the context surrounding the verb,
answering the questions posed above.
- (Optional) A description of the verbs similar to this one that found (or if you
couldn't find them, say why not), and how you did this analysis.
Exercises 1-3 from the tokenizing tutorial, and 1a-h, 2, 3, 4, 5a-b from the
Tagging Tutorial. Due Wed Sep 15 at 8pm.
Preslav Nakov and Barbara Rosario: Suggested solutions for A1;