is247 Project Proposal: TravelLite Query Visualization

We are interested in text visualization. In particular, we are interested in a visualizations system for the display of both metadata and content (topic) information, external and internal information. The system would rely on a back-end (statistical) system for the summarization, classification, clustering etc. The user would interact directly with such a system. The visualization interface would allow the user to choose the dimensions, select subset of the data and the like (like XMDVtool or Spotfire) and to interact with the statistical systems. It would also visualize the results from the statistical analysis and the user would be able to combine and control the display of metadata and statistical output in any (?) combination. In particular, we are interested in a visualization system for medical text. More specifically, we envision a system for the analysis of medical text, a visualization system, that given 1) a collection of medical text (with metadata information) 2) an ontology from which the metadata is drawn 3) a statistical NLP system for the analysis of the text allows the user to analyze the textual property of the text, in combination (or not) with the metadata. -> Project goals: * survey of the research done in this area and research of the available tools for this kind of application. * identify the main task(s) for the use of such a system * propose an interface for such a system -> What kinds of results you anticipate achieving * We'd like to understand what are the new kinds of visualizations that would work for this particular domain. * We'd like to propose an interface for the system. -> What kinds of results you would like to achieve but which you probably do not have the time or the tools for. * We'd like to perform a formal evaluation of the proposed interface. * We'd also like to research if one could use existing visualization tools for this application and start implementing at least some functionalities of the systems.

Answers to ProjectProposal, provided by:

We are interested in working on the incremental display of data in a database (the CONTROL project). We anticipate looking at several different types of large data sets, and the queries/tasks that might be applied to them, in order to envision how the results could best be displayed for that data. Some examples might be information from the CDC or World Health Organization (showing the spread of disease or patterns of contagion), sales data from large corporations (showing what products sell best and where, what regions/sales people are performing best), census information (demographic trends), agribusiness data (crop yields, livestock production, trade surpluses), and stock market information (market trends, sector analyses).

Since we would be focusing mainly on display techniques, rather than programming details, we would likely be using some form of drawing software (such as Visio or PhotoShop), and we might also use something like Flash for animations, to replicate the displays we would suggest. Other visualization tools we have used in class, such as Xmdv, Eureka, and Spotfire, would provide inspiration for possible "matches" of CONTROL's on-the-fly query modification capabilities and these applications' ability to innovatively display information. Any appropriate visualization techniques, either from class materials or created by us, would of course also be considered, depending on the tasks required for querying the various data sets, and the needs of the potential users creating these queries.

Steps would include learning more about CONTROL, finding and examining sample data sets, designing display prototypes for different visualizations, and, possibly, having others review the visualizations to see how helpful they actually are.

Ideally, we would want to come up with some innovative and useful ways of allowing the data to be presented incrementally, so that users can more quickly and efficiently zero in on the information that will be most useful to them. We would love to be able to actually see one of our visualizations actually implemented and tested, but it is unlikely, with our non-technical backgrounds, that this would happen.

Answers to ProjectProposal, provided by:

Traffic Behavior and Analysis Project Goals: To create a tool and/or suite of tools for visualizing traffic data integrated from various sources, such as sensors under freeway roadbeds and freeway photos from newspaper www sites. We'd like the tool to be usable for a variety of users, perhaps starting with a simple vizualization of traffic flow, with more sophisticated and varied vizualizations available in an integrated (drill-down?) fashion for the more demanding user. Tools: Telegraph software, traffic sensor data from the PATH project, data from CalTrans WWW site, other. URLs for data: I-80 Freeway Status Data http://www.its.berkeley.edu/projects/freewaydata/ Berkeley Highway Lab http://www.cs.berkeley.edu/~zephyr/freeway/ Steps: 1) Design visualization framework. (When I click on X, I want Y to pop up in such-n-such a format...) 2) Start by implementing visualization of the traffic sensor data from I-80 3) Add in visualizations for other data sources (CalTrans incident reports, Bay Bridge photos) in an integrated fashion (click on a map, pull up relevant data...) Desired Results: Complete design of a general-purpose visualization framework for traffic (not Bay Area freeway specific), partial implementation. (Full implementation depends on access to data we may not be able to get.) Blue-sky, we wish we could: Complete implementation.

Answers to ProjectProposal, provided by:

As our SIMS masters' project, we are planning to build a prototype of a system to create and distribute customized travel guides to deliver to PDAs. This would enable users to create interactive guides based on their interests and needs, rather than purchasing a static, bland product designed for a generalized perception of what a generic traveler in a region may need.

One of the primary tasks we will need to support as part of this prototype is that of building the guide online using a web-based interface. Travel guide content has a large variety of nominal and ordinal attributes associated with it, much of which is relevant to users when building the guide. The search/filter task can be daunting given a large database of content, and the possibility for failed queries (0 hits) is high as more attributes are added. We are interested in evaluating the appropriateness of a visualization tool such as the Attribute Explorer to help in building a customized travel guide, specifically assisting with query preview and overview of results proportionate to the overall content available. We hope that a visualization representing the attributes of the content, the distribution of content across these attributes, along with the possibility to provide greater feedback will prove useful in helping users filter content, ultimately refining it to a guide that suits their interests and needs.

We propose an experimental design and implementation to evaluate the comparative usability of the VAE tool and traditional, form-based queries in the task of customizing a travel guide online. Given the timeframe of the project, we plan to focus on the comparative evaluation of the two interfaces, across a series of specific tasks.

We have been unable to locate an implementation or the source for the original Attribute Explorer tool. IBM Alphaworks has a Java-based tool, the Visual Attribute Explorer (creators Andy Smith and Simon Moore, available at http://alphaworks.ibm.com/tech/visualexplorer), which is based upon the original Attribute Explorer and appears to have much of the same functionality. The Visual Attribute Explorer (VAE) is intended as a tool to complement existing data mining tools created by IBM, but it is available in a stand-alone implementation as well, for the purposes of evaluation and comment.

It is unclear at this point whether we will have access to the source code for VAE, and we have not yet had a chance to evaluate the tool ourselves. So far, our attempts at installing the tool locally (both on Windows 98 and NT) have been unsuccessful, but we are still troubleshooting these problems. The VAE tool is designed to import comma-separated files, so, on its face, it does appear that the content we are obtaining for the larger project could be accessed using VAE.

Ideally, the purpose of this experiment is to evaluate the appropriateness of the Attribute Explorer model for this type of search/filtering activity, prior to building it from scratch. If it proves successful and effective for users, we would consider building a version, or incorporating successful aspects, that could be deployed on the web. (We recognize that we will need to consider other concerns, such as speed/responsiveness, in a distributed version of the tool.)

Answers to ProjectProposal, provided by:

Visualizing Media Ownership

Background
The structure of industries changes as companies merge into multinationals, spin off new businesses and buy & sell interests in one another. These complex economic relationships are difficult to grasp, much less track over time.

Our Proposal
Our project proposal is to construct a visualization which allows exploration of organization in the media industries, depicting full and partial ownerships and relationships among companies in that space as well as each company's relative weight based on various factors (TBD). As a starting point for this project, we intend to use media ownership data collected by Aaron Moore at the Online Journalism Review. We anticipate that our visualization could be a useful research aid for students of journalism, communications, or media studies.

We plan to build our visualization to display information in a web browser, pulling from a backend relational database (MySQL). Tables will include Firm, Owner, Ownership (or relationship). Given this format, updates or reuse of our visualization will require some massaging of input data. Though we hope to gain access to OJR's current database for this project, we will do some of the data input ourselves. For display, we will likely start with PHP output to HTML, but we may need to use a more detail-oriented tool (Flash?) to accomplish our goals.

Possible Extensions
We hope to design our display in a general enough manner that it could be used to examine ownership information among any related group of companies. In that case, it might aid economists and social scientists as well. With the addition of a date variable, our visualization could be further extended to analyze patterns over time, allowing the identification of larger industry trends as well as the correlation of such trends with contemporaneous events or legislation. It is not yet clear whether this information will be available to us during our data collection process.

Answers to ProjectProposal, provided by:

As a final project for the 247 Course I would like to use one of the data set provided by Anna Wichan\ sky, ORACLE. I want to have a look at the Videostore data and visualize it with a spatial analyst (Ar\ cView by ESRI) and try to develop a new way to look at periodical data such as sales (maybe with a si\ nus-function). I hope this proposal meets your criteria,

Answers to ProjectProposal, provided by:

Time-Centric Data Visualization for Networked Sensors

Abstract

Recent advances in micro-electronics and machine (MEMS) technology has enabled a new class of sensor technology. These sensing devices combine the power of larger sensors but in a physical form-factor of only a few square centimeters to microscopic size. Dubbed "networked sensors", they typically have an embedded processor capable of a few million instructions per second (MIPS), a limited amount of data storage (i.e. 4KB or less), a small power source, a short range wireless communication terminal (i.e. radio or optical) and an array of environmental sensors. While the functionality of an individual networked sensor is limited, a collection of them working in concert could achieve a wide variety of high-level tasks not possible with conventional sensing methods. Examples include fine-grained temperature gradient detection, wireless radio transmission and fading patterns and cell-level medical analysis.

As yet, there are still many technological hurdles to overcome to realize the full capability of networked sensors. Communication and power consumption are just a couple of these. However, another class of problems arises that are beyond the sensors themselves: data management and interpretation. A large sensor network has the ability to create large amounts of data in near real time. Being able to manage and interpret this data in a timely manner becomes paramount to the meaningful deployment of a sensor array. In addition to the sensor data, managing a large (i.e. hundreds or thousands) sensor array will require new tools and methods. Data visualization will be a key factor for both the data interpretation and sensor management tasks. Also, we believe that the nature of these tasks will be time-centric in nature. That is, they require the ability to visualize the data in real-time and navigate through and compare with historical input.

The purpose of this project is to prototype and test visualization methods for networked sensors. The project will involve the deployment of a networked sensors array, the collection of data from this array, the development of visualization tools and an evaluation of their effectiveness.

Project Goals

Tools Required

Steps Required

Results Anticipated

Future Work

Determining the usability and effectiveness of a visualization interface would typically require many user studies and a sophisticated method of feedback. We do not anticipate having time to do this given the time and the research nature of the tools.

Answers to ProjectProposal, provided by:

Visualizing Peer to Peer Distributed Networks

Rachna Dhamija, Danyel Fisher, Ka-Ping Yee

Goals: We propose to visualize a peer to peer network (such as Gnutella or Freenet). There are two directions this project could take:

1) We can incorporate the network visualization into an interface to help users search and retrieve files and help them find out information about other nodes (e.g., how reliable they are, how likely they are to have interesting content).

2) We can collect and visualize network data to help researchers/system designers answer questions such as:

1) First we need to formulate a tighter research question and choose one network to investigate (e.g., Gnutella, Freenet)

2) Next we need to get data! It is possible to instument a Gnutella client to collect data (once instrumented, we would be able to gather a significant amount of data over a short time period). It may also be possible to obtain data from a Freenet server that has been running for some time.

4) Exploration (preferably using existing existing viz tools, but this depends on how ambitious we are.)

Free Riding on Gnutella- Eytan Adar and Bernardo Huberman, Xerox PARC An analysis of user traffic on Gnutella shows a significant amount of free riding in the system. By sampling messages on the Gnutella network over a 24-hour period, we established that almost 70% of Gnutella users share no files, and nearly 50% of all responses are returned by the top 1% of sharing hosts.

Steve G. Steinberg's map of the Gnutella network - Steinberg modified a Gnutella client to perform the equivalent of traceroute and created the map using Graphviz. This graph was created during a static point in time, from the point of view of one node.

Bandwidth Barriers to Gnutella Network Scalability DSS Clip2, September 8, 2000- The scalability of a Gnutella network to accommodate more users performing more searches is limited by the lowest bandwidth links prevalent within the network.

Error and attack tolerance of complex networks Reka Albert, Hawoong Jeong & Albert-Laszlo Barabasi University of Notre Dame, Nature July 2000 (PDF) -The authors find that scale-free networks, including the Internet, display an unexpected degree of robustness- the ability of their nodes to communicate being unaffected even by un-realistically high failure rates. However, error tolerance comes at a high price in that these networks are extremely vulnerable to attacks (that is, to the selection and removal of a few nodes that play a vital role in maintaining the network’s connectivity).

Answers to ProjectProposal, provided by:

Visualizing Visualuzations

Motivation

Part of the problem is that the visualization tools are meant to be general, and are designed by programmers who don't know in advance what the specific characteristics of users' data are. As a result, it is difficult to design a tool that can be used in all cases.

If the programmers who have the capability to design the visualization tools don't have access to the data, then why not give the people with access to the data the ability to design visualization tools? Perhaps a sort of "Visualization Tool Construction Set" would be useful to allow to design visualizations that provide them with the most insight. Ideally, such a construction set would allow the user to design the displays graphically, instead of writing lines of code.

Proposal

Using palettes of display types (scatter plots, stripcharts, temperature graphs, etc.) and of data operations (mathematical, sorting, etc.), users would be able to view their dataset in a way that they understand because it is of their own design.

Process

Time permitting, a LabVIEW example will be constructed to demonstrate the concept. If possible, other users will be invited to try it and evaluate its effectiveness.

Answers to ProjectProposal, provided by:

DRAFT - Visualizing Museum Collections

The San Francisco Museum of Modern Art (SFMOMA) maintains a collection of art works diverse in media, origin, and domain. The collection is in fact a series of sub-collections, each overseen by one or more curators, none of whom have seen every object in their sub-collections, let alone the entire collection. A large percentage of such a collection is more often than not kept in storage, unseen by curators and the public alike. Physical browsing of the objects is not practical, existing curatorial records consist mainly of metadata, and browsing thumbnails alone is inadequate. In addition, the collection of any large museum such as SFMOMA is constantly growing and changing shape.

We propose prototyping a system, using existing software packages, that would allow curators and others at the San Francisco Museum of Modern Art (SFMOMA) to visualize aspects of the museum collection that remain difficult to grasp using existing tools. The Visualizing Museum Collections (VMC) tool would use data currently stored in the museum's collection management database (Embark) in conjunction with a visualization tool such as Spotfire to allow users to examine relationships either at the item level, that is within a sub-collection such as photography, at the collection level, or between collections, such as between the photography and the architecture collections.

As a simple example of how VMC might work, consider a variation of the Movie Finder application that Ben Shneiderman demonstrated in class, but with the y axis indicating the years since an item was last shown, and the x axis indicating the year an item was made. Then imagine that all sub-collections were checked and that the curator was able to use a set of sliders to zoom and filter on artist, style, time-period, medium, etc., to isolate an interesting combination of objects that hadn't been shown to the public in at least 10 years. Also imagine that a thumbnail image of any object could be viewed at any time either by rolling the mouse over a single data point, or by selecting a range of data points and then shifting to a thumbnail view mode.

While Spotfire may not be the tool we decide to use, it does suggest a key set of functions that we think the VMC system will need to provide. These can be expressed by repeating Shneiderman's handy "Information Visualization Mantra: Overview, zoom & filter, details-on-demand." Regardless of the visualization tool chosen, a large part of the project will entail designing effective custom views, interfaces, data element selection techniques, and data encoding scripts that will make sense in the museum context.

Answers to ProjectProposal, provided by:

Joanna Plattner

Gabriel Lucas

Project goals, including what kinds of tasks the interface containing the visualization is targeted towards.

SUMMARY:

Our goal is to design a visualization tool to show the relative value of networked referral sources.��

DETAIL:

Virtually every professional services business gets new customers from multiple referral sources.�� The resulting �referral chain� that develops over time can generally be described as a hierarchical tree.� However, in some cases customers can be traced to more than one parent, resulting in a more complex� �tangled tree� structure. For example, source 1 might have referred sources 2, 3, and 4.� Source 2 might have referred sources 5 and 6, source 3 might have referred no one, and source 4 might have referred source 7 and 3, who unbeknownst to them was already aware of the service.

The question every business should ask is, "How valuable is a particular referral source?"� Value could well mean some monetary quantity, like gross revenue or profit, or it could be some other quantifiable measurement.� Yet, answering this question is not as simple as producing a table with each source in column, and the value measurement in the other column.� The added complexity is the network of relationships.� In the above example, without source 1, the value from sources 2 through 7 would not have been realized. Without source 2, the value from sources 5 and 6 would not have been realized, and so on.� Thus, to truly determine the value of a particular source, one must look in the "referral tree" at all the child branches spawning from that source.

In addition to �value�, time is a dimension that we expect to be of particular interest to service oriented businesses.� In many cases referrals are perishable - becoming less and less valuable with time.� Possible use case scenarios involving the dimensions of time, value, and referral source include: 1) How has this person�s value to me changed over time?� 2) How has my network of clients evolved over time?��

Although we want to incorporate the time dimension into our visual analysis tool, before committing to it as a deliverable we must do more research and determine whether or not it is an achievable goal considering our time and resource constraints.

While our system will be designed and tested with a specific domain in mind, there are numerous alternative domains that for which it might be very useful.� One example is WWW hyperlink referral information.�� Web site administrators would surely value a visualization of all the web sites offering links to their site(s), along with how much traffic each link generated.

WHICH TOOLS WILL BE USED TO ACCOMPLISH THE GOALS (THIS CAN CHANGE IF NEEDED).

To accomplish our visualization goals, initially we are going to use Mineset�s Tree Visualizer tool, which we saw demonstrated in class on October 19.�� Fortuitously, a free evaluation copy of the software is available for downloading.� Tree Visualizer appears to have the key visualization features that we need.� It allows the simultaneous depiction of hierarchical relationships and quantitative values.�� However, it isn�t clear if Tree Visualizer allows �leaves� to be categorized under multiple branches.� If not we will have to resort to creative naming conventions to differentiate �children� based on their �parent�.

http://www.sgi.com/software/mineset/movies/dectree.mov

We may also experiment with Tree Maps if we can get access to the software.

WHAT STEPS WILL BE REQUIRED TO ACCOMPLISH GOALS.

We have a dataset that Gabe has been developing.� It has relationships among sources, and dollar amounts associated with those sources.� Thus, the data is pretty much ready.� There are a few decisions we need to make, including whether to simplify the data so that we can focus on the visualization part of the assignment.� For example, there are cases where a particular source has more than one parent source.� It may be best to discard these data points (at least initially), so that we can develop a prototype more quickly.

While we finalize the dataset, we plan to sketch some preliminary ideas and obtain user feedback, to determine what would be most useful to potential users.

Ideally, we will incorporate our visualization into a web interface.�� Implementation details will depend on the visualization tool we choose because the tool itself may or may not already be �web ready�.

WHAT KINDS OF RESULTS YOU ANTICIPATE ACHIEVING.

Ideally, our visualization tool will:

* Graphically demonstrate the relative value of sources

* Encapsulate the value of a particular source's child sources without overloading the user with too much data.� That is, if a source has many child sources, the tool should summarize the value of those child sources using some visualization technique, rather than display the specific data for those child sources at the parent source.

* Show the relationships among the sources

* Allow the user to zoom into a particular subset of the referral tree, in cases where the tree is large

* Allow the user to obtain the specific data for a source (perhaps by clicking on the source)

* Easily transferable to any domain or industry

WHAT KINDS OF RESULTS YOU WOULD LIKE TO ACHIEVE BUT WHICH YOU PROBABLY DO NOT HAVE THE TIME OR THE TOOLS FOR.

As mentioned above, we may not have time to consider complex data cases, such as sources' having multiple parent sources.� We may also not be able to consider time series data...that is, how profitable is a source over a particular time period?� Finally, we may not be able to give the user the ability to change the parameters in the algorithm used to summarize the value of child sources.� In our prototype, we would probably only be able to change those parameters within the actual program code.