SIMS 247: Information Visualization

Group Project: Bibliographic Information Visualization and Analysis (BIVA)

Team: Chitra Madhwacharyula, Colleen Whitney, and Lulu Guo
[Project Goals][Related Work] Data [Visual Mappings][Rationale][Evaluation][Future Work][Appendices]

Characterization of Data

Background

Within the union catalog maintained by the California Digital Library, the UCLA bibliographic collection comprises about 4.5 million records containing information about items present in the UCLA libraries. We chose to narrow the set to the UCLA collection for our project because we had access to related historic circulation data spanning mid-1999 – mid-2005.

Within this collection, we were interested only in items that circulate (about 1 million items) since the main goal of the BIVA project is to visualize and analyze circulation patterns. Hence we confined our dataset to the items that have circulated at least once between 1999 and 2005.

For our initial prototype, we selected a random sample of about 300 items. The second iteration expanded to a larger dataset, about 9,100 randomly selected items.

Description of Dataset

The main tables in our dataset, extracted from disparate and complex datasets maintained at CDL, are as follows:

bib_items

This table stores minimal bibliographic information about every item, including a unique identifier (called sysid), the ISBN number, title, and subject. Each item is classified into one of eleven general subject areas based on the first letter of its Library of Congress call number class. This table also contains a holdings count, which represents the number of libraries that hold the item within a large consortium of libraries nationwide.

circ_trans

This table contains minimal historic circulation transaction data for the UCLA library. Circulation information includes transaction ID (uniquely identifies the transaction), and transaction date. Patron information is limited to a general category (staff, faculty, undergraduate, graduate, other).

mn_circ_summary

This is a summary table used to speed up the generation of bar charts by months within each year. It contains circulation transaction counts by year, month and subject.

yr_circ_summary

This is a summary table used to speed up the generation of bar charts by subject across years. It contains counts of circulations transaction by year and subject.