Assignment 3: Exploratory Data Analysis
In this assignment you will use a set of tools to perform exploratory data analysis on a dataset. The three tasks for which visualization is an important tool are exploration (searching a data set for interesting phenomena), confirmation (validating or refuting a hypothesis about features you believe to be present in the data), and presentation (conveying information to others). Our focus on this assignment will be the first two tasks.
- relationships between pairs of variables (correlations, clusters)
- outliers of various kinds
- trends
The software you'll be using is:
- Spotfire (on lab machines)
- Eureka (on lab machines)
- Parvis Parallel Coordinates (download Java executable)
- Optional: try out some of the other views using ILOG (download Java executable).
(On the lab machines, the programs are under Programs > Research & Analysis > Data Visualization.)
The suggested data set is Florida election data from the CMU Statistical Data Repository. (Note: when downloading these files, be sure to use the correct "save-file" operation for your browser ... IE tends to add extra characters that confused the programs.)
- Background reading (ppt)
- Original Data Set with commentary.
- The 2000 Florida Ballots project(Supplied by Melanie Feinberg.)
- Eureka format
- Spotfire format
- STF format for Parvis
- ILOG format
Optional: Obtain a different dataset of your choice. It must have at least five dimensions (fields) and at least one of the important fields should be nominal data (i.e., have no inherent ordering). The dataset should have at least 100 records. The richer it is the more likely you are to find something interesting, but stay away from datasets that have more than about 30 fields, as they tend to be too complex to evaluate without statistical help.
This assignment is due at class time on March 3rd. You are encouraged to work in pairs for this assignment. To turn this in, I'd prefer it if you can put your results at a url somewhere and email the url to me. (If you must do it in Word, mail me a url; I don't like dealing with attachments.) Please turn in a hardcopy in class as well as emailing an url to me.