Steve Lam

SIMS 213

Pilot User Study

Introduction

System Introduction

The system being evaluated is an interface for anti-spamming tools. The tools in the interface have been categorized relative to their abstract locations as an overall system. This physical placement of different tools allows the user to see the system as a collective whole as well as perceive the tools relative to other structures in the system. This embodied vision allows the interface to be much more intuitive and not as abstract as a plain list, which gives the user no insight as to what the tools do and how they affect the rest of the system. With intuitiveness, comes a better evaluation of the problem if the system needs troubleshooting.

Purpose of the Study

The purpose of this experiment is to evaluate users� actions and reactions to the system. The evaluation will measure the efficiency of how users perform tasks. Furthermore, it is to also confirm that the schema of the physically represented interface fits users� mental models of the system and it�s tools. Other parameters to be measured are the validity of users� linguistic explanations and their alertness to different visual events presented in the interface, such as, the "emergency" links and the buttons that perform and initiate different functions within the system. These subjective parameters will be tested by the users� "out-loud" reactions to the presentation of the interface to see if the system is predictable to the user.

Method

Participants

The participants were selected based on their familiarity of the system. And, they had the following profiles:

Participant A is a twenty-four year old male. He has a Bachelor of Arts degree in Fine Arts. He has been using computers all his life, mostly for recreational purposes (i.e. games), but has also used them for the creation of media-related material (i.e. interactive web sites and graphic art). He was selected because he has just started working for the company and is still familiarizing himself with the tools.

Participant B is a twenty-four year old male. He currently has sophomore status with a major in Cognitive Science. His experience with computers encompasses PERL scripting and familiarity with the UNIX operating system. He has worked for the company for an estimated time of three months and is considered an intermediate user of the tools.

Participant C is a twenty-three year old male. He has a Bachelor of Science degree in Electrical Engineering and Computer Sciences. His experience with computers includes C/C++ programming, advanced UNIX knowledge, and networking skills. He has worked for operations at Bright Light for ten months and has also developed and improved some of the tools. He is considered an expert.

Apparatus

I performed the study at the BLOC (Bright Light Operations Center). The computer I used to execute the interface was a Pentium II 266 MHz running Windows 95. While running the interface, I used a stopwatch to measure the time and a separate computer to type the users� reactions and procedures and record any critical incidents.

Tasks

The user is to look at the alert page and select a message. He is to view this message and determines if it is spam. If it is, the participant will write a rule and store it. During this task, I observed the process by which the user utilized the separate windows instead of having to navigate back and forth.

The participant, after viewing the message, cannot determine if it is spam or not. He cannot, therefore, he performs a traceroute. If the traceroute tool reveals a path consistent with spam, he will write a rule and store it. Here, I observed how the user could efficiently bring up other tools quickly without having to do a long search on the interface.

After the participant logs in, he checks the rules pending window first. Since the committing of the rules takes several minutes, the user will choose to look at other messages. During this procedure, I observed what other tasks the user anticipated in performing when the system was performing another process.

Procedure

In order to assure the consistency of instruction and non-interference of outside variables, such as, bad explanations of the interface or unfamiliarity of basic operations, like not knowing how to close a window, I performed the same demonstration and allowed only one dry run through the system for all three participants. The dry run was also a task chosen by me of whom all users would understand and be able to observe and grasp the basic operations of the overall system.

For each participant I introduced the system by showing them the interface and a location of the tools. Then, I performed a dry run talking through each procedure so that they knew how the interface would operate and what portions of it were functional. After my demonstration, I allowed them to familiarize themselves with the particular procedure, not the interface, of how to select tools and close windows. I then instructed them to explain every process of which was not explicit to me, such as, their reasons for performing certain operations on the interface. In other words, I instructed them to think out-loud.

During the performance of the three tasks by the three individuals, I measured the time it took each participant, independent of the system�s background processes, to complete each task. I also recorded, step-by-step, the processes each participant performed to complete the tasks.

Test Measures

I measured the time it took for the participant to find the necessary tool(s). This was to assure the logical placement of tools and the visual ability of the participant when performing a search for a certain object.

I observed each participant�s reaction to certain events. For example, I looked for the user�s feedback in vocal or facial expressions when a window popped up after clicking on a tool. I wanted to see if the interface was intuitive enough so that it wouldn�t "surprise" anyone when something happened.

I observed the order in which they performed the operations for certain tasks. This feedback would tell me if the sorting of the tools in a certain series would improve efficiency.

I observed if the user moved the windows to any particular place because it was obstructing their view. If they did, I wanted to see if it was consistent across all users so that I could reprogram the windows to open in another defaulted area.

Results

Initially, it took the users a longer time to search for the necessary tools needed to perform each task. But, this time improved for each user, with each consecutive task, due to their learning of the system from the previous task. Aside from the initial difficulties experienced by the users, they seemed to find the new interface much more usable and intuitive. They especially liked the idea of each window opening in its own defaulted position without it obstructing the view of other windows. They also liked having other functions being performed without the interruption of current processes.

For the first task, after the users logged in, they scanned the whole interface and looked for the link that would bring up the alert window. After clicking on it, they proceeded normally and were able to write and store a rule with efficiency. The opening of the windows and the systems feedback for each operation was also not a surprise to the users. At one point, Participant A noted that he liked the fact of having the message window link with the detail window. The average time it took to perform the whole task was 31 seconds for all three participants. Participant A took 25 seconds. Participant B took 41 seconds. Participant C took 28 seconds.

In the second task, the users have already become competent with the new interface. By thinking out-loud, the users had shown evidence that they had learned the scheme of the graphically enhanced system. After searching for the tool listed under the icon that represented it, the user proceeded with the task at hand. The task of performing a traceroute was also improved. The users were able to perform both the traceroute and view messages at the same time. The users were also pleased that they did not have to sacrifice a window in order to perform a traceroute. The average time it took to perform the task was 44 seconds for all three participants. Participant A took 47 seconds. Participant B took 34 seconds. Participant C took 51 seconds.

For the third task, the users were already highly familiar with the interface. They began the task by visiting the rules pending page first. When they opened the window they said that they liked the fact that it opened in a defaulted area. They found that this did not obstruct other functions they were performing in the system. The average time it took to perform the task was 18 seconds for all three participants. Participant A took 17 seconds. Participant B took 23 seconds. Participant C took 16 seconds.

The overall results showed that the users had varied times, and they did not correlate with expertise. They were consistent, though, to the newly designed features they liked, such as, the graphical interface, the defaulted windows, and the questions the interface would ask them before continuing a time or resource consuming task.

For the follow-up interview, the users expressed that they did not like the term change for specific tools. They thought the "trace source" tool was another tool instead of the "traceroute" tool of which they were familiar with. I also gave them some of the heuristic evaluator�s suggestions for improvements in the interface of which I did not update yet. They said that they wished for the interface to stay at the current configuration because that is what they were trained on. Also, they said that it was easier to communicate the terminology used on the interface to administrators since it would be clearer and more concise. They also did not want the rules to keep giving them feedback that they were active since the spamwall acknowledgements would tell them that. This was also due to their concern of all 30,000 rules giving them feedback and obstructing further processes.

Discussion

What was learned from the pilot study?

The users do react better to a graphical interface, even though they were trained on one that was list-based. This theory can be further propagated to encompass completely new users. Since users already familiar with the old interface can quickly pick-up from the new system, new users can perform at about the same ease. Another convention learned was that the terminology should not be hastily changed. Since troubleshooting involves both the users of the system and the administrators, they have already set aside certain vocabulary for the problem and the tools. Changing the vocabulary would mean not only changing what the users have already learned, but making people who do not interface with the system and perform other duties learn a whole new vocabulary.

What would be changed for the experiment

Instead of telling users to perform a task, I will tell them to open a tool or a window and time them on that process. This may isolate certain spatial problems with the tools� placement not seen when performing tasks because there are too many variables. Rather than do a dry run with the interface, I may just open up the system and tell a new pool of participants to run through it by themselves. This would show me if the interface were truly intuitive. There may have been other variables involved with the improved response time or quickness in learning. The users may have memorized what I did, and since it was domain-neutral, they did not really receive a fresh initial introduction to the system. Also, the informal experiment was really a "toy-world" run-through. With the formal experiment, I would create a scenario where the system does go down and see how the users react to an emergency. See if they utilize the different features of the system.

Changes in the interface due to the experiment

Because the users did not express concern with some of the initial sizes of the windows, I may enlarge the more important ones, or, the ones that are more commonly used so that the users have a larger workspace and have the ability to look at more information at once. Since the users did find some of the new terminology to be difficult, I may change it back and see again, how they perform.

Formal Experiment Design

Hypothesis

Some windows or tools may show a dramatic difference in response time from the users if measured individually. Also, if a real emergency was created as a drill to the users, their response time may be faster, and their conveyance of the problem at hand may be better than that of the old interface.

Factors and Levels

Independent Variables: The factors that varied in the interface would be the task scenarios. Instead of telling users to perform a task, I would just tell the users to open up certain tools or perform certain functions isolated from the other parts of the system. For example, I would just say to the participant, "Find the traceroute tool and bring it up" or "Write a rule and store it". These parameters would allow me to measure the placement of the tools and the users� reaction times independent from other tools. Also, for the emergency situations, I would hard-code a piece of the interface to act as if it was functioning incorrectly then leave the individual alone so that he thinks the system is up and running. This would take-out the observer�s paradox in which the user may look for the experimenter for clues, or act differently because an observer is present. Also, since the user thinks this is a real situation, he may perform in a more natural manner.

Dependent Variables: In this formal experiment, I will be measuring the time it takes for the windows to be opened and for the users� searching techniques. Another factor I will be measuring, of course, is the time it takes to resolve an emergency situation and the validity of the information conveyed to the administrator when the system does go down.

If the users do show a dramatic difference of times for each of the tasks I told them to perform, then the windows should be reorganized so that everyone reaches an optimal performance level. If this is not the case, then I may have to look at the system as a whole and where in the task the users may be stumbling. For the emergency situation, if the users have a difficult time assessing the problem and conveying to the higher authority what is wrong, the emergency protocols will have to be redesigned so that the interface will "lead" them through a similar situation.

Blocking and Repetitions

For the first measurement, I will only need to take out the functionality of the rest of the tools and only provide functionality to a single tool; the one I will be measuring. For each individual, I will ask them to find the one tool I am measuring and then move on to the next tool. I will repeat this across ten individuals in order to receive an accurate estimate. For the second task, I will have to perform a single-blind test in which the individual does not know what is going on and thinks that the system is up and running. I will repeat this with only five individuals because the task may be time consuming and if the interface is truly non-intuitive then it will immediately show.

Appendices

Materials

Instructions

This is an interface I am testing on you. The operations are exactly what you see in windowing systems. You press on the X at the corner to close it. The buttons on the interface are interactive. I will first go over a dry run of the interface and then I will allow you to familiarize yourself with the interface by allowing you to "toy" around with it. After that, I will have you perform three tasks of my choice.

Demo Script

First I log in.

Then I type my user id and password of which will be anything for the user id and "steve" for the password.

Then I press submit.

Here the main interface is presented.

I will now bring up the probe statistics window by clicking on it.

I will enter the required fields and press the analyze button.

Now the statistics window will show the results.

After I am done, I can log out by pressing the exit button on the main interface.

Follow-up Interview

Now that you have done the three tasks, what do you think of the interface? Please feel free to make any suggestions or express any difficulties in performing the tasks.

The following were noted in another person�s evaluation of the system. Do you agree to the following and would you like the interface to conform to these recommendations?

The terms should be changed so that novices can understand the interface.

Would you like active rules to give constant feedback?

Do you want the Attack ID to be used interchangeably with the term Message?

Raw Data

Participant A

Task 1

He logs in and sees the interface

He clicks on the Alert link and the Alert window is brought up

He clicks on the link for the spam details and writes a rule for it and stores it

Task 2

He logs in

He opens the Alert window by clicking on the link

He clicks on the message body link

The message window is brought up and he clicks on the button which brings him directly into the details window

Not sure if it is spam, he performs a traceroute

He clicks on the traceroute tool in the main interface and performs a trace

It reveals that it is spam, so he writes a rule and stores it

Task 3

He logs in

He checks for the rules first

He wants to view messages so he goes to the main interface and clicks on the Alert link

Then in the Alert window he clicks on the message link and the message body is brought up

He then goes back to the rules pending window and commits two rules

Participant B

Task 1

He logs in and sees the interface

He scans it for a short moment

He clicks on the Alert link and the Alert window is brought up

He clicks on the link for the spam details and writes a rule for it and stores it

Task 2

He logs in

He opens the Alert window by clicking on the link

He clicks on the message body link

He seems surprised that the message body can link directly to the details window

The message window is brought up and he clicks on the button which brings him directly into the details window

He bypasses the traceroute and directly writes a rule and stores it

Task 3

He logs in

He checks for the rules first

He commits three rules and exit the interface

Participant C

Task 1

He logs in and sees the interface

He clicks on the Alert link and the Alert window is brought up

He clicks on the link for the spam details and writes a rule for it and stores it

Task 2

He logs in

He opens the Alert window by clicking on the link

He clicks on the message body link

The message window is brought up and he clicks on the button which brings him directly into the details window

I tell him to perform a traceroute while looking at the messages

He clicks on the traceroute tool in the main interface and performs a trace

It reveals that it is spam, so he writes a rule and stores it

Task 3

He logs in

He checks for the rules first

He wants to view messages so he goes to the main interface and clicks on the Alert link

Then in the Alert window he clicks on the message link and the message body is brought up

He then goes back to the rules pending window and commits one rule