SIMS 247 Assignment 2
Issued: Feb 5
Due: Feb 19 (extended from original deadline of Feb 17)
Format to turn in: hardcopy or pointer to a url.
Last modified 2/13/98
We have obtained a version of Ahlberg's Spotfire (IVEE) System to
experiment with. Do not copy this program to use elsewhere as we
don't have permission for that (although you can download a similar
version from the web onto your own machine if you like). Spotfire
allows for some limited forms of filtering, brushing, and linking of
scatterplots and bar graphs.
You can run Spotfire from the startup menu on the SIMS NT accounts
(under Programs\Research & Analysis). Read the online manual that is
also accessible from the startup menu (to get this to work right you must
first have the netscape browser running and then select the manual
from the menu). Bear in mind that not everything in this manual is
correct -- I have found several errors in the general discussion.
When in dispute, rely on the published papers we've been reading.
First, play around with some of the sample data sets they have
supplied. Get to know how the sliders work and how to set the
features for changing color, shape, and size (under properties on the
Edit menu). Note that by right-clicking by one of the sliders you can
change how it displays ranges. Note also that you can get a linking
effect between scatterplots, bar charts, and a combination of these by
using the New Window command.
Load in the film dataset. Do some operations that involve first
filtering and then viewing different subsets of the data.
-
Consider the alphabetical sliders (used for names of directors,
actors, etc.) Discuss its advantages and disadvantages as a device
for selection of this kind of information.
- There are a lot of nominal variables in this collection, which can
make comparing within a scatter plot trickier. Show an example
combining variations in color, shape, and size -- depending on
the data type -- within a scatter plot.
- What happens to a scatterplot when you adjust the X and Y
scrollbars? What can this be helpful for? How can it be misleading?
Why is the name ``zoom bar'' as used in the manual for these scrollbars
misleading?
Carol Anderson and I have put together a large collection of U.S. census
data, which you can get access to on newt at G:Groups\is247\data (there is
one big data file and some smaller subsets as well). A file
containing a description of this data appears in the data directory.
Each data point represents information for a U.S. county.
This collection is harder to work with in this tool than I
anticipated.
One of the main problems is that they way nominal variables are
represented -- each data point is a count of a given attribute of
a nominal variable for a given county. In other words, instead of a data
point representing a person with a given commute time, salary, race,
etc., a data point shows the number of people with a given commute
time, race, or salary range for an entire county.
Another problem is the system is that it does not allow the plotting of
more than two variables at a time on the scatterplot.
However, after doing some experimenting around I've found some
ways to display certain relationships.
- Assume you are trying to understand the relative proportions of
each race for all the counties. Plot the county counts for one race
vs. another on the X and Y axes. Now make two scatter plots, each
with one race the Y axes and total population on the X. How does this
view differ from the first? What happens when you use a slider to
filter out subsets of another race?
- What is wrong with the way color/shape/size ranges are assigned by
the program for the purposes of this dataset?
- How can one show, say, the distribution of population sizes for counties tthat lie within a range of states that are in close proximity
to one another?
- (This is the big question for this assignment.) Assume you are
working for two different bosses: senator Pachyderm with ``traditional''
republican party viewpoints, and senator Burro with ``traditional''
democratic party leanings. They are both working on the new budget.
Try to come up with at least one graph for each to help them make an
argument about something about the budget (this does not have to
correspond to the real budget that is currently being negotiated, but
it can if you like). Show how each senator
might try to debunk the other senator's graph. Despite possible
temptations otherwise, please keep all descriptions civil and in good taste.
- Describe some modifications that should be made to the program to
make it easier to discover trends in this dataset. (But still only
using scatterplots and bar charts.)
- Optional Extra Credit. Load in a dataset of your own, and produce
graphs that show some trends. Explain the datasets and your graphs.
The dataset must have at least 300 points and must reflect real data.