SIMS 290-2: Applied Natural Language Processing

   Fall 2004, Prof. Marti Hearst

Course Information


Final Project Description

Final Project Class Presentations

Monday Dec 6:

  • Eva Mok:
    Semantic Analysis of Child-Directed Mandarin Chinese using Construction Grammar

  • Christine Hodges:
    Preliminary exploration of applications of the FrameNet database for question answering

  • Preslav Nakov:
    Noun Compound Bracketing

  • Roger Bock:
    Acronym Recognition and Disambiguation

  • Dan Perkel and Ryan Shaw:
    Clustering community reviews of Internet Archive content

  • Andrea La Pietra, Sarah Poon, and Hong Qu:
    NLP Analysis of Linguistic Features of Popular Blogs

Wednesday Dec 8:

  • Jeff Heer:
    Social Network Analysis of the Enron Data Set

  • Kavita Mittal and Annie Yeh:
    Recipe Back-of-the-Book Indexer

  • Brooke Maury and Vijay Viswanathan:
    Recording Artist Community Metadata

  • Simon King and Jeff Towle:
    Improving Automated Medata Data Hierarchy Generation

  • Yongwook Jeong:
    Comparing WordNet Similarity Measures

  • Andrew Fiore:
    Analysis of the Enron Corpus via Clustering

  • Murali Rangan:
    Qualifying social relations in Enron Data Set

Assignment 4

    Assignment 4: Enron Email Corpus

    Assignment 4 Sample Projects:

    • Roger Bok: Improving the acronym definition recognizer. pdf
    • Murali Rangan: Acronym definition recognition according to different search terms. doc
    • Christine Hodges and Andrea La Pietra: Analyzing an NER and visualizing the resulting networks. doc
    • Eva Mok: Mapping names to email addresses and doing network analysis. pdf and text file with name mappings
    • Jeff Heer: Initial processing for social network analysis. doc
    • Sarah Poon and Hong Qu: Analyzing assertion of political influence via NER. doc

Assignment 3

Assignment 2

    Assignment 2 Sample Solutions:

    Choose a text collection (one provided by NLTK, or any other you may want to use; in the latter case you should run a POS tagger over it first). Choose a verb that interests you, and find all the sentences that are tagged with that verb (probably best to use all of its conjugated forms). Be sure you have a good size number of sentences containing the verb in its various forms (at least 30).

    Using the NLTK shallow parsing facility (chunk, unchunk, and chink rules along with RegexpChunkParser), produce shallow parsers of the selected sentences. You may want to start with the ones that I presented in class, but you should improve on them greatly. I suggest using multiple rules for each type of chunk (NP, VP, PP, others if you like). When you turn in the assignment you should show some before and after parses on the same sentences to show how much your rule rewrites have improved the chunker.

    Analyze the argument structure of the verb that you've chosen. What kinds of subject and objects does it tend to take, both syntactically and semantically?

    (Optional) Now try to find at least two verbs that take objects or subjects that are similar in form, either syntactically, semantically, or both. If you can't find any, try to describe why not. Be sure to describe how you tried to find the similar verbs.

    You may want to use the functions that I discussed in class:

    Optional paper that has some good ideas:
      VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations, Chklovski & Pantel. EMNLP 2004. pdf

    To turn in:
    1. Samples of before-and-after parses of sentences using your improved rules compared to those I've provided, and a description of your regexps.
    2. Your code in one or more files.
    3. A description of the characteristics of the context surrounding the verb, answering the questions posed above.
    4. (Optional) A description of the verbs similar to this one that found (or if you couldn't find them, say why not), and how you did this analysis.

    Due Wed Sept 29 at 8pm.

Assignment 1

    Exercises 1-3 from the tokenizing tutorial, and 1a-h, 2, 3, 4, 5a-b from the Tagging Tutorial. Due Wed Sep 15 at 8pm.

    Preslav Nakov and Barbara Rosario: Suggested solutions for A1; Suggested Code