SIMS 247 Assignment 2

Issued: Feb 5
Due: Feb 19 (extended from original deadline of Feb 17)
Format to turn in: hardcopy or pointer to a url.

Last modified 2/13/98


We have obtained a version of Ahlberg's Spotfire (IVEE) System to experiment with. Do not copy this program to use elsewhere as we don't have permission for that (although you can download a similar version from the web onto your own machine if you like). Spotfire allows for some limited forms of filtering, brushing, and linking of scatterplots and bar graphs.

You can run Spotfire from the startup menu on the SIMS NT accounts (under Programs\Research & Analysis). Read the online manual that is also accessible from the startup menu (to get this to work right you must first have the netscape browser running and then select the manual from the menu). Bear in mind that not everything in this manual is correct -- I have found several errors in the general discussion. When in dispute, rely on the published papers we've been reading.

First, play around with some of the sample data sets they have supplied. Get to know how the sliders work and how to set the features for changing color, shape, and size (under properties on the Edit menu). Note that by right-clicking by one of the sliders you can change how it displays ranges. Note also that you can get a linking effect between scatterplots, bar charts, and a combination of these by using the New Window command.

Load in the film dataset. Do some operations that involve first filtering and then viewing different subsets of the data.

  1. Consider the alphabetical sliders (used for names of directors, actors, etc.) Discuss its advantages and disadvantages as a device for selection of this kind of information.

  2. There are a lot of nominal variables in this collection, which can make comparing within a scatter plot trickier. Show an example combining variations in color, shape, and size -- depending on the data type -- within a scatter plot.

  3. What happens to a scatterplot when you adjust the X and Y scrollbars? What can this be helpful for? How can it be misleading? Why is the name ``zoom bar'' as used in the manual for these scrollbars misleading?

Carol Anderson and I have put together a large collection of U.S. census data, which you can get access to on newt at G:Groups\is247\data (there is one big data file and some smaller subsets as well). A file containing a description of this data appears in the data directory. Each data point represents information for a U.S. county.

This collection is harder to work with in this tool than I anticipated.

One of the main problems is that they way nominal variables are represented -- each data point is a count of a given attribute of a nominal variable for a given county. In other words, instead of a data point representing a person with a given commute time, salary, race, etc., a data point shows the number of people with a given commute time, race, or salary range for an entire county.

Another problem is the system is that it does not allow the plotting of more than two variables at a time on the scatterplot.

However, after doing some experimenting around I've found some ways to display certain relationships.

  1. Assume you are trying to understand the relative proportions of each race for all the counties. Plot the county counts for one race vs. another on the X and Y axes. Now make two scatter plots, each with one race the Y axes and total population on the X. How does this view differ from the first? What happens when you use a slider to filter out subsets of another race?

  2. What is wrong with the way color/shape/size ranges are assigned by the program for the purposes of this dataset?

  3. How can one show, say, the distribution of population sizes for counties tthat lie within a range of states that are in close proximity to one another?

  4. (This is the big question for this assignment.) Assume you are working for two different bosses: senator Pachyderm with ``traditional'' republican party viewpoints, and senator Burro with ``traditional'' democratic party leanings. They are both working on the new budget. Try to come up with at least one graph for each to help them make an argument about something about the budget (this does not have to correspond to the real budget that is currently being negotiated, but it can if you like). Show how each senator might try to debunk the other senator's graph. Despite possible temptations otherwise, please keep all descriptions civil and in good taste.

  5. Describe some modifications that should be made to the program to make it easier to discover trends in this dataset. (But still only using scatterplots and bar charts.)

  6. Optional Extra Credit. Load in a dataset of your own, and produce graphs that show some trends. Explain the datasets and your graphs. The dataset must have at least 300 points and must reflect real data.