Course Info
The central question of our course is how computational linguistics and information visualization can be put to the service of text-based research in the humanities.
This course will bring together students from the humanities who want to learn how technology can change how they do research, and students from information and computer science who want to help design and build the next generation of tools for humanities scholars, with a focus on analysis of written literature.
Students from each discipline will be expected to be open to learning from the other. The course will consist of readings and discussion of research papers as well as analysis and evaluation of existing tools. Students will be expected to contribute to the design, analysis, and/or evaluation of a new software tool for scholarly literature analysis.
Students interested in machine learning and natural language processing (NLP) will have the opportunity to apply those tools to literary analysis problems. Students interested in human-computer interaction (HCI) will have the opportunity to enage with problems of information design and visualization for analysis, search, and navigation.
Humanities students should have an open mind and a passion to learn about new techniques.
Units: 3
Challenges
Text similarity and the vocabulary problem.
Literature scholars and historians are often interested in passages of text with a common thematic or conceptual similarity. So far, the only way for them to find passages of interest is to read their texts closely and mark the passages individually. This is an unreliable and time-consuming process: researchers' moods, recent experiences and states of mind affect what they notice when they read. How can we apply NLP and HCI to make this process faster and more reliable?
- Natural language processing:
- Finding similar or related passages of text.
- Finding search terms to broaden a search
- Calculating words that behave in 'similar' ways to words of interest.
- Using annotations and annotated passages to infer interesting items.
- Human-Computer Interaction:
- Helping users keep track of and find interesting passages while reading.
- Exposing patterns of word usage and context in a list of search results.
- Transitioning between searching for interesting passages and reading/annotating text.
Visualization and Analysis
Humanities researchers working on different problems need different kinds of information about their texts. Visualizations of text-based data can provide overviews and perspectives unavailable from reading. However, it's often not enough to simply extract and visualize the information: when building tools, it's important to think about interactions, as well as how the visualization fits into the rest of the scholar's analysis process.
- Will the researcher want to:
- Go from the visualization to the source data?
- Save the results of a visualization?
- Compare two or more visualizations?
- Create another one based on the current visualization?
- Perform a search based on the visualization?
- Other questions are important too:
- What kinds of zooming, grouping and filtering should the visualization support?
- What source data is the most helpful?
- What is the best way to help users manage and track their analysis history?
Background
Information and computer science students should have experience or backgrounds in some subset of database programming, XML design, graphic design, user interface design, information visualization, natural language processing, machine learning, data mining and/or statistical analysis as well as general programming skills.