Paparazzi Pilot Usability Study

Paparazzi is a system for topic browsing blog information, providing context and information about bloggers and the conversations they engage in online across blogs.

We conducted testing using Paparazzi 3.0, iterated from the 2nd interactive prototype before testing began. Major changes made included adding color, changing vocabulary, fixing minor bugs, modifying an icon, and modifying the formatting. In addition we doubled the data set.

Purpose and Rationale for the pilot usability study

The purpose of the study was to observe, quantify and understand how users reacted to the system, gain insight into their understanding of the system and data, and gather quantitative information about the pathways and uses of the system in terms of time, clicks and efficiency. This information would then be used to plan a larger more involved user study.

Test participants

The target participant for our test would ideally consist of users of Paparazzi and users of other blog and news information sites such as Feedster or particular blogs such as the Instapundit or news information aggregation tools such as NewsGator. Users who actively browse and search blog and news related content online are generally more familiar with the construction and linking practices of blogs and the quirky styles that blogs sometimes maintain, but also the possibility of finding far more interesting and knowledgeable information generated by experts as compared to general news reports. These users often filter their content through blogs for mainstream news, and will need to search or topic browse for information they are looking to find. As a result of this experience and exposure, these users may offer specific opinions relating to a topic-browse blog-search site’s content, layout, features, and functions. These users may also be more adept at recognizing and proposing changes and/or additions to the content, features, and functions of blog news sites.

We chose three test subject who use online blog and news sources. These individuals offered insight into our system and roadblocks that would keep them from utilizing the features or the system as a whole (names have been changed to maintain participant confidentiality):

Structure of test

Testing included:

Users were tested on a 15” mac powerbook, running the Safari web browser, and an attached logitech optical mouse. Two of the users were familiar with macs, one was not, though all seemed to adapt quickly to the limited keyboard requirements and frequent mouse clicking required to complete the tasks.

Participants' activity was recorded with digital video camera. The camera recorded the computer screen, but not the participant (except for the voice).

All three tests were conducted on the same day. A participant appeared individually, filled out the user consent forms, and a filled short written questionnaire (below) to get them into the "right" frame of mind. The participant was then taken to a table, where a video camera was positioned to record the computer (but not the participant, other than voice). The user was then given an explanation of the system, though not the tasks. At that point, the tasks were given to the participant, one at a time, and notes were taken as well as the video. After the tasks were completed, the user was questioned about the system and his experience with it.

Test evaluation criteria and discussion

While we see the importance of quantitative measures, we believe that we are still at the stage where qualitative feedback is more useful. That being said, we identified possible test measurements. After the test, we discussed the effectiveness of these measurements.

Possible test measurements for each scenario:

While we took down quantitative measurements, in general they did not work very well during this test. Almost every task was completed correctly. In the one case where a task was not completed correctly, it is difficult to tell if this was because of not understanding the task or not understanding the interface. The time measures of task turned out not to have value or be accurate, because users were asked to talk through their actions, which often significantly slowed their activities and caused them to experiment, read back what they were looking at, and interpret their thoughts to us. The most efficient path to success also turned out to not be a good measure, because there were often two ways to go, with an equal number of clicks to get to the information.

Qualitatively, though, we learned that users who do things via a particular path or style often replicate that path over and over, across different tasks. Therefore, judging one path as most efficient may not have value, if there are similarly efficient ways to go about a task. However, the clicks for tasks were often between two and three to find the information requested in the task, and so we generally deemed this measure not very valuable. Also, regarding clicks generally, because the users were “playing” with the interface a bit (they would often land on the page with the information, and then click around to see what else was there), we did not get good click analysis from the backbend system and it was difficult to follow manually their rapid clicks when this would occur.

Test results - what we learned

Raw interview notes

Conclusions

Appendices and documents

Pictures of setup

Pre-test questionnaire (.doc)

Post-test interview question

Tasks

Script

Consent form (.doc), Release form (.doc)