Helpful Programming Links
Unfortunately, there really is not appropriate textbook for this course.
Instead we will be reading online readings and handouts.
Students will need to learn a bit of Python programming for this course, and so
an introductory Python book is recommended.
These are the two main NLP textbooks, used in most NLP courses:
This one is short; the chapter on text categorization is especially relevant,
but there isn't enough relevant for us to merit requiring it:
This text has been around for quite a long time, but has some very useful
sections. An advantage is that it is now online and free.
These are guides to text processing as opposed to linguistic processing. The
first uses java and is good for what it does (I have it if people want to look
at it). I haven't seen the second one.
The first below is an excellent introduction to Python; the second is an online
book which I've found a helpful supplement:
Books on English grammer. The first is a wonderful, concise summary. The
second is an amazing encyclopediac account of English grammer.
Some other NLP books:
Many NLP Software links
Below is software needed for installing NLTK and related code.
I recommend the pywin IDE, which is why it is included below.
The only one that isn't entirely straightforward is the last one -- the
collection of corpora. It requires you to select a subset of collections to
work with and uncompress those into a specific directory. Instructions are
I'm not entirely sure right now which collections we will use -- I could see all
of them being useful for various projects. It is very important that we have
WordNet, which is quite large. You can install a few now and add more as you go.
Links to Supplementary Information
Regular Expressions and Finite Automata