SIMS 247: Information Visualization

Group Project: Bibliographic Information Visualization and Analysis (BIVA)

Team: Chitra Madhwacharyula, Colleen Whitney, and Lulu Guo
[Project Goals][Related Work][Data][Visual Mappings] Rationale [Evaluation][Future Work][Appendices]

Rationale for Design Decisions

Overview-to-detail

When we first tried to decide what kind of general layout our visualization was going to have, PaperLens provided us valuable illumination. So we decided to take a similar approach in our design. In the left panel, the visualization displays an overview of the whole collection, organizing the circulation information by subject. Within each subject, circulation is graphed by time. In this way, users can quickly grasp the pattern of circulation by subject over the entire collection and easily make comparison across different subjects and time.

We tried several ways of how to display the subject, time, and circulation information. In the end we agreed that following the bar chart format was the most compact and effective way. Since we had 11 subjects, we reduced the size of each bar chart and made a two-column layout to minimize the need to scroll down the screen, which might be inconvenient for our users.

From the left panel, we wanted users to be able to zoom in and view more detailed information about a subject and time of interest. And in addition, we wanted to introduce a new dimension, holdings, because we are interested in understanding the relationship between circulation and holdings as we consider how each measure might be used as a shorthand way to gauge popularity of items in the collection.

After some discussion, we came to the conclusion that a scatter plot might be the most helpful way to visualize the possible correlation between these two numeric variables. So we created links: clicking on any bar in any of the bar charts generates a scatter plot detailing the relationship between circulation and holdings data for that particular subject and time period in the right-upper panel. We hoped that with the scatter plot, users would be able to discover any interesting relationships between circulation (Y-axis) and holdings (X-axis). For example, could a relationship between the two variables be observed from the graphs? If so, was it a positive correlation, negative correlation, or some other pattern?

From the scatter plot, we wanted to provide the ability to access item-level information, in order to allow for the exploration of outliers. This was realized in the right-lower panel. After studying the data available to us and re-visiting our project goal, we decided to only focus on relevant information in this panel given the space limit. According to the needs of our CDL users, we ended up choosing just four key variables: item ID, item title, circulation dates (listed by temporal sequence), and patron type. A more fully developed visualization could include more detailed item-level data.

Filters (interactivity)

In our first prototype, we didn’t have the filters for subject and time. Then we quickly realized the drawback: the left panel could not be used flexibly. In order to make a comparison of any two specific bar charts, users would have to scroll up and down. They could easily lose track of the information when relocating their attention among the similar graphs. We thought that adding filters could make the left-hand section of the screen much more flexible.n the second prototype, we added filters.

Users now have more control over which information is displayed on the screen; they can remove the subject bar charts they are not interested in. Moreover, when they select particular from the year filter, all the bar charts in the left panel will automatically zoom into monthly data. Although the filters took up a little more space than before, it definitely added more interactivity and flexiblity into our design.

Choice of software

Given the size of our dataset, it was impossible for us to hard code everything into our visualization. Therefore, we needed to find a tool that would allow us to dynamically generate the graphs. From our previous experience with Tableau and Spotfire, we thought they might be good options for us to consider.

Tableau had several advantages. The most relevant feature that was related to our project was that it could general small multiples, which would be necessary for us to realize our multi-panel design. However, these multiplies were all static. They couldn’t fulfill our brushing and linking requirement, and therefore couldn’t allow us to generate dynamic graphs. Spotfire, on the contrary, had excellent brushing and linking capabilities, but it couldn’t generate small multiples for different levels of data exploration.

Finally we decided to employ a PHP graphics package (JpGraph) backed by a MySQL database. First of all, we could easily connect our data from MySQL database to the front-end online visualization by writing a small amount of PHP code. This made it easy to generate dynamic content in our visualization. Second, it automatically generated client side image maps to make it possible to produce drill-down graphs. In addition, the JpGraph library assigned context sensitive default values for most of the parameters in a graph which minimized the learning curve. Third but not the last, it supported a wide range of plot types including scatter plots, bar charts, and lots of others. Although later we found that JpGraph did not meet all of our needs (it didn’t include a perfect definition for timeline plot), we felt that with JpGraph we could develop almost all of our desired features within a relatively short time frame.