Steve Lam
SIMS 213
Pilot User Study
Introduction
System Introduction
The system being evaluated is an interface for anti-spamming tools. The tools in the interface have been categorized relative to their abstract locations as an overall system. This physical placement of different tools allows the user to see the system as a collective whole as well as perceive the tools relative to other structures in the system. This embodied vision allows the interface to be much more intuitive and not as abstract as a plain list, which gives the user no insight as to what the tools do and how they affect the rest of the system. With intuitiveness, comes a better evaluation of the problem if the system needs troubleshooting.
Purpose of the Study
The purpose of this experiment is to evaluate users’ actions and reactions to the system. The evaluation will measure the efficiency of how users perform tasks. Furthermore, it is to also confirm that the schema of the physically represented interface fits users’ mental models of the system and it’s tools. Other parameters to be measured are the validity of users’ linguistic explanations and their alertness to different visual events presented in the interface, such as, the "emergency" links and the buttons that perform and initiate different functions within the system. These subjective parameters will be tested by the users’ "out-loud" reactions to the presentation of the interface to see if the system is predictable to the user.
Method
Participants
The participants were selected based on their familiarity of the system. And, they had the following profiles:
Apparatus
I performed the study at the BLOC (Bright Light Operations Center). The computer I used to execute the interface was a Pentium II 266 MHz running Windows 95. While running the interface, I used a stopwatch to measure the time and a separate computer to type the users’ reactions and procedures and record any critical incidents.
Tasks
Procedure
In order to assure the consistency of instruction and non-interference of outside variables, such as, bad explanations of the interface or unfamiliarity of basic operations, like not knowing how to close a window, I performed the same demonstration and allowed only one dry run through the system for all three participants. The dry run was also a task chosen by me of whom all users would understand and be able to observe and grasp the basic operations of the overall system.
For each participant I introduced the system by showing them the interface and a location of the tools. Then, I performed a dry run talking through each procedure so that they knew how the interface would operate and what portions of it were functional. After my demonstration, I allowed them to familiarize themselves with the particular procedure, not the interface, of how to select tools and close windows. I then instructed them to explain every process of which was not explicit to me, such as, their reasons for performing certain operations on the interface. In other words, I instructed them to think out-loud.
During the performance of the three tasks by the three individuals, I measured the time it took each participant, independent of the system’s background processes, to complete each task. I also recorded, step-by-step, the processes each participant performed to complete the tasks.
Test Measures
Results
Initially, it took the users a longer time to search for the necessary tools needed to perform each task. But, this time improved for each user, with each consecutive task, due to their learning of the system from the previous task. Aside from the initial difficulties experienced by the users, they seemed to find the new interface much more usable and intuitive. They especially liked the idea of each window opening in its own defaulted position without it obstructing the view of other windows. They also liked having other functions being performed without the interruption of current processes.
For the first task, after the users logged in, they scanned the whole interface and looked for the link that would bring up the alert window. After clicking on it, they proceeded normally and were able to write and store a rule with efficiency. The opening of the windows and the systems feedback for each operation was also not a surprise to the users. At one point, Participant A noted that he liked the fact of having the message window link with the detail window. The average time it took to perform the whole task was 31 seconds for all three participants. Participant A took 25 seconds. Participant B took 41 seconds. Participant C took 28 seconds.
In the second task, the users have already become competent with the new interface. By thinking out-loud, the users had shown evidence that they had learned the scheme of the graphically enhanced system. After searching for the tool listed under the icon that represented it, the user proceeded with the task at hand. The task of performing a traceroute was also improved. The users were able to perform both the traceroute and view messages at the same time. The users were also pleased that they did not have to sacrifice a window in order to perform a traceroute. The average time it took to perform the task was 44 seconds for all three participants. Participant A took 47 seconds. Participant B took 34 seconds. Participant C took 51 seconds.
For the third task, the users were already highly familiar with the interface. They began the task by visiting the rules pending page first. When they opened the window they said that they liked the fact that it opened in a defaulted area. They found that this did not obstruct other functions they were performing in the system. The average time it took to perform the task was 18 seconds for all three participants. Participant A took 17 seconds. Participant B took 23 seconds. Participant C took 16 seconds.
The overall results showed that the users had varied times, and they did not correlate with expertise. They were consistent, though, to the newly designed features they liked, such as, the graphical interface, the defaulted windows, and the questions the interface would ask them before continuing a time or resource consuming task.
For the follow-up interview, the users expressed that they did not like the term change for specific tools. They thought the "trace source" tool was another tool instead of the "traceroute" tool of which they were familiar with. I also gave them some of the heuristic evaluator’s suggestions for improvements in the interface of which I did not update yet. They said that they wished for the interface to stay at the current configuration because that is what they were trained on. Also, they said that it was easier to communicate the terminology used on the interface to administrators since it would be clearer and more concise. They also did not want the rules to keep giving them feedback that they were active since the spamwall acknowledgements would tell them that. This was also due to their concern of all 30,000 rules giving them feedback and obstructing further processes.
Discussion
What was learned from the pilot study?
The users do react better to a graphical interface, even though they were trained on one that was list-based. This theory can be further propagated to encompass completely new users. Since users already familiar with the old interface can quickly pick-up from the new system, new users can perform at about the same ease. Another convention learned was that the terminology should not be hastily changed. Since troubleshooting involves both the users of the system and the administrators, they have already set aside certain vocabulary for the problem and the tools. Changing the vocabulary would mean not only changing what the users have already learned, but making people who do not interface with the system and perform other duties learn a whole new vocabulary.
What would be changed for the experiment
Instead of telling users to perform a task, I will tell them to open a tool or a window and time them on that process. This may isolate certain spatial problems with the tools’ placement not seen when performing tasks because there are too many variables. Rather than do a dry run with the interface, I may just open up the system and tell a new pool of participants to run through it by themselves. This would show me if the interface were truly intuitive. There may have been other variables involved with the improved response time or quickness in learning. The users may have memorized what I did, and since it was domain-neutral, they did not really receive a fresh initial introduction to the system. Also, the informal experiment was really a "toy-world" run-through. With the formal experiment, I would create a scenario where the system does go down and see how the users react to an emergency. See if they utilize the different features of the system.
Changes in the interface due to the experiment
Because the users did not express concern with some of the initial sizes of the windows, I may enlarge the more important ones, or, the ones that are more commonly used so that the users have a larger workspace and have the ability to look at more information at once. Since the users did find some of the new terminology to be difficult, I may change it back and see again, how they perform.
Formal Experiment Design
Hypothesis
Some windows or tools may show a dramatic difference in response time from the users if measured individually. Also, if a real emergency was created as a drill to the users, their response time may be faster, and their conveyance of the problem at hand may be better than that of the old interface.
Factors and Levels
Independent Variables: The factors that varied in the interface would be the task scenarios. Instead of telling users to perform a task, I would just tell the users to open up certain tools or perform certain functions isolated from the other parts of the system. For example, I would just say to the participant, "Find the traceroute tool and bring it up" or "Write a rule and store it". These parameters would allow me to measure the placement of the tools and the users’ reaction times independent from other tools. Also, for the emergency situations, I would hard-code a piece of the interface to act as if it was functioning incorrectly then leave the individual alone so that he thinks the system is up and running. This would take-out the observer’s paradox in which the user may look for the experimenter for clues, or act differently because an observer is present. Also, since the user thinks this is a real situation, he may perform in a more natural manner.
Dependent Variables: In this formal experiment, I will be measuring the time it takes for the windows to be opened and for the users’ searching techniques. Another factor I will be measuring, of course, is the time it takes to resolve an emergency situation and the validity of the information conveyed to the administrator when the system does go down.
If the users do show a dramatic difference of times for each of the tasks I told them to perform, then the windows should be reorganized so that everyone reaches an optimal performance level. If this is not the case, then I may have to look at the system as a whole and where in the task the users may be stumbling. For the emergency situation, if the users have a difficult time assessing the problem and conveying to the higher authority what is wrong, the emergency protocols will have to be redesigned so that the interface will "lead" them through a similar situation.
Blocking and Repetitions
For the first measurement, I will only need to take out the functionality of the rest of the tools and only provide functionality to a single tool; the one I will be measuring. For each individual, I will ask them to find the one tool I am measuring and then move on to the next tool. I will repeat this across ten individuals in order to receive an accurate estimate. For the second task, I will have to perform a single-blind test in which the individual does not know what is going on and thinks that the system is up and running. I will repeat this with only five individuals because the task may be time consuming and if the interface is truly non-intuitive then it will immediately show.
Appendices
Materials
Instructions
This is an interface I am testing on you. The operations are exactly what you see in windowing systems. You press on the X at the corner to close it. The buttons on the interface are interactive. I will first go over a dry run of the interface and then I will allow you to familiarize yourself with the interface by allowing you to "toy" around with it. After that, I will have you perform three tasks of my choice.
Demo Script
Follow-up Interview
Now that you have done the three tasks, what do you think of the interface? Please feel free to make any suggestions or express any difficulties in performing the tasks.
The following were noted in another person’s evaluation of the system. Do you agree to the following and would you like the interface to conform to these recommendations?
Raw Data
Participant A
Task 1
Task 2
Task 3
Participant B
Task 1
Task 2
Task 3
Participant C
Task 1
Task 2
Task 3