ReadingTree Usability Study

Introduction
Our user study tested the interface of the ReadingTree website. This test was designed to evaluate the perceived usefulness and usability of the system, according to our target user group--elementary school-age children who like talking about and finding out about books. We wanted to see children interacting with the system, performing both structured and unstructured tasks in order to see which features are most attractive to our users and which ones present difficulties. We also wanted to confirm that we had sufficiently addressed the problems identified during the heuristic evaluation, and see how child users responded to the member rewards structure (newly added to our prototype).

Method
Participants
We tested five children, ages 6 to 11 (2nd through 5th grade). There were four boys and one girl. Three testers were children known to us through family or classmates, while the others were recruited through the Berkeley Public Library. We did not perform any formal screening, other than to ask the children whether they enjoyed both books and computers. None of the children were non-readers (i.e. all of them self-selected as enjoying reading to some degree) and each of them had prior computer and web browsing experience.

Apparatus
We conducted three of the tests in the upstairs computer lab in South Hall. These participants used Netscape Navigator 4.75 running on a Windows NT workstation with 19" monitor. The other two were conducted at the test subjects' home in Castro Valley. These participants accessed the Internet via the AOL browser, running on a home PC with 56k modem and 13" monitor. The computer was on a desk in the corner of the kitchen, next to but facing away from the family room.

Environment
Since we lacked a formal usability lab in either place, there were distractions in each location. In the South Hall computer lab, other students were talking and working nearby, and the Campanile bells made it difficult to hear during at least one test. Additionally, the parents of each child returned about 45 minutes after the beginning of the test, which seemed to signal to the child that the test was over. This contributed in part to the paucity of the responses to the post-test questions.

Tests conducted in the home also had distractions. Since the computer was located in one end of the kitchen, dinner preparations and other household sounds occasionally diverted the users' attention. The computer was also adjacent to the family room, the door to which was usually closed. However, siblings and friends playing video games quietly in the next room would periodically observe the proceedings for a few minutes, which also distracted the users to a degree.

Tasks
We had six primary tasks that we hoped to have the user perform. These were:

Sign up to become a member of ReadingTree. (Designed to test signup procedure.)
Find a book you have read and enjoyed. (Where the users were unable to find any books in our limited database, we directed the users to Harry Potter. Designed to test search capabilities and assess how kids were likely to search for a known item.)
Say what you think about it. (While this task was designed to lead to rating and reviewing, any feedback was permissible -- we allowed the users to determine when the task has been completed, more or less. Designed to test rating, reviewing and possibly asynchronous communication.)
Find out what other kids think about this book. (This task was often completed before the above. Designed to see when/if/how other kids' opinions influenced their decision to read a book or not.)
Find a book you haven't read but think you might enjoy. (Designed to test how the system supports kids' unknown-item searches, including concept of the Bookshelf, book suggestions, Featured Books, What's Hot and The Yuck List.)
Remember this book for later. (Designed to test how our interface supports asynchronous book choice, i.e. how a kid remembers an interesting book when ee is not able to immediately acquire it. Includes concept of the Bookshelf.)

While we tried as much as possible to lead each user through each task, we were also interested in the unstructured browsing of the site, since a user could conceivably quite happily use the site often without using all its features. During these tasks, or site exploration, we were concerned with 5 general areas:

Content (amount of writing, readability)
Navigation (easy for users to get where they want to go, intuitive for beginners)
System structure (users know what their options are)
Usefulness (of system, of specific features)
Appeal (graphics, color, level of interactivity)

Specifically, we were hoping to gain insights into the following questions:

What find-a-book methods do children prefer? (searching for an item, browsing by subject, looking at a hand-picked list)
What say-what-you thought methods do children prefer (rating, reviewing, posting a message)?
What save-book-info method do children prefer (save, print, write it down, tell someone)?
When/where do users want a help/site search/site map, if ever?
Do our terms make sense: Treehouse? Bookshelf? What's New?
Do users understand how book recommendations are generated/improved (and do they want to know?)
How could search function be improved?
Is the site interactive enough? (Was there enough to do here?)
Under what circumstances would children use this site? What would make them more likely to use it?

Procedure
We attempted to stick with the script as much as possible. After a brief introduction to the project members, the purpose of the site and the assurance that "we're not testing you," we asked the user a number of questions about ers experience with computers and the Internet, and ers opinion about books and reading. The user was then permitted to explore the site for a few minutes before beginning the formal tasks. Their free-form exploration occasionally overlapped with one or more of the tasks. Since this test iteration was not concerned with deriving statistically valid data, the order in which the tasks were completed (and indeed whether the task was completed at all in some cases) was not enforced.

Rather than ask the users to think aloud, which seemed artificial and difficult for kids, the facilitator attempted to ask questions during the testing about why certain tasks were completed in certain ways and whether completion seemed difficult or confusing. The facilitator tried to strike a balance between maintaining a constant flow of communication between tester and user and distracting the user with continual questions.

Since a large part of what we hoped to discover pertained to overall satisfaction with the site and an understanding of what's available and how to use it, we decided not to curtail site exploration in favor of task completion. Wherever possible, the facilitator attempted to remind the user of the task at hand, or ask about the user's interest in some aspect unrelated to the current task.

Once the tasks had been completed, or about 30 minutes of interaction with the system elapsed (whichever came first), we stopped the testing and asked several followup questions pertaining to overall satisfaction with the system, appropriateness for younger kids and perception of the relationship between kids who like computers and kids who like to read. We then solicited open feedback about anything they wanted to tell us about their experience with ReadingTree.

Testers received a $10 gift certificate to either a local bookstore (for children who had indicated previously that they loved to read) or to a local movie theater.

Test Measures
We were interested in observing two main types of user/system interaction - their ease in performing tasks (i.e. could they sign up, or obtain book suggestions), and their ability to notice and understand system functionality (i.e. could they figure out what to do with a bookshelf, or understand the results of a poll).

Test measures included the time it took to complete each task, the path taken, the number of questions asked, the number of times the facilitator had to help the user, as well as user-stated ease of use and satisfaction ratings.

We collected data in two ways:

One observer used an event logging spreadsheet, developed by Anoop Sinha from the GUIR group of Berkeley's Computer Science Department. This tool allowed the observer to quickly track user movements and basic interactions.
Another observer took notes on a hardcopy version of the task list and questionnaire. This observer concentrated on recording user comments and user/faciliator interactions.

Results

Because our tasks were somewhat open-ended, we were not able to observe every user using every task available from ReadingTree. Furthermore, during the tests we recorded these evaluations of understanding, ease of use, and enjoyment anecdotally (through observations and recording answers to questions), rather than systematically. However, some general patterns did arise.

User typing and spelling skills strongly affected usability
Though all of our users reported using computers at home and school, they seemed more comfortable with the mouse than the keyboard. During the test they all typed slowly and deliberately, and were very careful not to make mistakes. While this meant that the user (usually) produced an an error-free document, it also led to some user frustration, and activities like sign up and book reviews took a long time for users to complete. (see User Two, Task B notes).

Spelling was also a concern for the kids - slightly misspelled author names returned no hits during search and a couple of users were hesitant while typing reviews and warned us throughout the tests that they were bad spellers. Fear of posting reviews with misseplled words could keep some kids from posting - something we need to closely watch in further tests.

Users who liked points REALLY liked points
(Note - We added a bare bones Member Rewards section - consisting of member levels, certificates and bookmarks - after submitting but before testing this version of Reading Tree.)

While all of the users noticed they received points for rating/reviewing books or answering polls, and most visited the member rewards page from the link on their Bookshelf, two of our five users became very interested in points - what they could get, what level they were at, etc. For one user getting and monitoring his points was the primary concern, and he was interested in completing tasks only if they would help him move to the next member level.

Introducing points was intended to increase and reward Reading Tree member participation. While it motivated all users to varying degrees, we may have to accept that for some users acquiring points may be a bigger motivator than a simple love of reading.

Screen size mattered
Though we modified our screens to to be usable on a 800 x 600 monitor, we found that small screen size had some adverse effects on usability. Important functions (such as new member sign up) displayed completely below the fold and pages became more difficult to read as column width narrowed.

Kids read content - especially if it's from other kids
All of the users carefully read and had comments about kid reviews and bulletin board posts. Users wrote their own reviews carefully and read other's reviews critically with clear ideas of the kind of information they wanted (i.e. recommendations like "If you liked Book A you'll like Book B"). They placed a lot of value on what other kids had to say.

A variety of search options are used (if not always useful)
Different users had different preferences for searching and navigating. Most users used either title/author or subject search more, but all were able to use both during the test. Users were less likely to use the letter-based search, but it is unclear if they didn't see or didn't understand that option.

Visual representation of the user's performance during the study:

Additional data:

Discussion We learned a great deal from our pilot study.

1. The site appeals to children who like to read. Four of our five users said that they would use this site again and would recommend it to a friend. (We are aware, however, that these results may reflect the children's desire to tell us what they think we want to hear, rather than their true feelings.)

2. The basic navigational structure supports most users in achieving primary goals of our design personae, Danny and Jenny: finding a book they would like to read and letting other kids know about a book they have read. One of our users, however, experienced a great deal of trouble in accomplishing the tasks and requested that the test be concluded early. This could mean that our design is somewhat frustrating for younger, less Web-savvy and less book-loving kids. This would need to be investigated further through more user testing.

3. There are several issues with our page layout and form design that must be addressed if the website is to succeed with this user group. Many of these changes are simple, and are necessitated by variation in monitor sizes and by our users' limited ability to type and spell.

Sign Up

Sign Up option should appear above the sign-in boxes on the home page.

On the "Sorry, you're not a member" page, Sign up needs to appear above the fold. When sent to this page in error, kids did not know what to do to correct the problem, because the sign-up forms were not immediately visible.

Also, if a user makes a mistake in signing up, ee should not be required to enter all of the information again, only the problematic items. This was an error handling element that we meant to implement but decided to postpone. We saw in the user tests that it is an essential feature. It took one child 15 minutes to complete the sign-in process.

Sign-up field labels should align with the fields. Even the small misalignment that resulted when the site was displayed on a smaller monitor led to unnecessary confusion.

Find a Book

Title search and author search need to be moved farther apart. Having them grouped together so closely caused every user who attempted a title search to enter the author's name as well.

Kids were very frustrated when they searched for a book and received no results. Until our database is more complete, we should consider allowing them to add books. Also, when no results are returned the system should send the user back to the Find Book page automatically and perhaps provide tips for improving the search results.

Spelling is also a challenge. A future version of our system should offer search results that include "near misses"-- titles and authors that match the search criteria by all but one or two letters.

Review a Book
"Review a book" link needs to be made more prominent on the book information page.

On the "review book" form, we should consider how to revise the labels to make it clear that a) each box is optional and b) the writing does not need to be formal.

Member Rewards

Because the point system clearly had a strong appeal, our design needs to make it easy for kids to find out how to earn and spend their points. Links to information about member rewards should appear on every page of the site, including the "thanks for rating/reviewing/answering a poll" pop-up page.

Answer Polls

We either need to clarify that the vote does not get submitted until the user clicks Rate it, or we need to change the system so that the vote is submitted automatically, on the first click, perhaps with a confirmation message providing the option to cancel it.

Changes for the "Real" Experiment

We would like to address the issue of self-selection by testing with users who did not self-identify as "kids who love to read." More generally, we would like to test a broader range of user types, of different genders, ethnicities and cultures, as well as different levels of computer skills.

We found this pilot study useful for learning how the site's basic features and structure would be received. We did not require users to complete every task on the list. For our real experiment, we would take a slightly more formal and structured approach, asking users to complete every task in a specific order. We might also use a grid similar to the one shown in the Results section, to be sure to capture the same information for each task and user.

In gathering feedback from the kid testers, we would design a form to collect quantitative data on satisfaction and ease of use. Following recommendations from Allison Druin and others, we would use graphical images such as smiling and frowning faces as anchor points on vertical rating scales. We could read our questions out loud and ask the children to mark the point on the scale to indicate "how much" of something is true. According to some researchers (Risden, Hanna, and Kanerva 1997), children respond more reliably to this sort of representation which incorporates meaningful images and concepts of more and less (using a vertical rather than horizontal scale).

Formal Experiment Design

We propose to study two variations of our page design. Our current prototype has a horizontal orientation, with two or three columns of content, so that the pages are wide, rather than long. We believed that this design would allow children to more easily discover and access ReadingTree's features. We are curious to know whether our current design succeeds in this goal or whether a vertical page orientation, which requires more up-down scrolling but which is also more conventional for Web interaction, would be more usable and satisfying. We also wonder whether a user's goal in visiting the site will make a difference to the user's design preference.

Hypotheses: Our first hypothesis is that users will prefer the horizontal page design, regardless of their goals in visiting the ReadingTree site. Our second hypothesis is that users will spend more time at the site and discover more features with the horizontal design.

Factors and Levels:

Our tests would focus on the most content-intensive pages (Find Book, Treehouse, and What's New) as these pages would be most dramatically affected by the page reorientation.

Independent Variables:

Page design: horizontal (as in current prototype) vs. vertical (would need to be implemented--here is an example page).
User goals. We would select users of two kinds: children who read for pleasure frequently and children who read only for school assignments. The teacher would make the first assessment, which we would validate with a "reading habits" questionnaire.
Gender. We would hope to have equal number of boys and girls in each test group so as to minimize (and identify) any gender-based differences.

Dependent:

Ease of use ratings
Subject satisfaction rating
Success at completing task (able to find a book they would want to read?)
Length of time using the system (at what point do they become bored?)

Plan to Control Confounding Variables:

We would limit our user base to those in a single grade, or better still, at the same reading level (as assessed by a teacher) to minimize the confounding variable of reading ability.
We would also seek to recruit users with a minimum level of web experience, so that difficulties resulting from lack of knowledge of basic Internet conventions (for example, using underlining to indicate a hyperlink) will not confound the results. Level of experience will be assessed through a screening questionnaire that asks about current number of hours per week of Web use and number of years using the Web.

Blocking and Repetitions: We would use a 2 X 2, between-groups design. There are three reasons we opted for a between-groups design. First, users in our target age group have a limited attention span so asking them to test two systems in a single session is not realistic, unless we abbreviate the tasks and interview sessions. However, asking users to participate in two separate sessions is also not workable, as it is difficult to recruit minors for even one experimental session. Learning effects are a third concern--the second interface tested would most likely be received more favorably because it is more familiar.

	Reads for Fun		Does Not Read for Fun
	Boys	Girls	Boys	Girls
Horizontal Layout	5	5	5	5
Vertical Layout	5	5	5	5

Ideally, and provided that we had the full cooperation of an elementary school (or schools), we would test 10 users in each cell, 5 boys and 5 girls, to allow us to complete a thorough statistical analysis of the results.

Appendices

Materials

Recruitment flyer

Consent form

Script, with pre- and post-test questionnaire

Raw Data

List of design problems and programming bugs

User Data (user responses and our observations)

Event logger (Excel spreadsheet--includes observer comments)

Summary of pages visited (Excel spreadsheet)

Fix list for Prototype 3