This experiment
is designed to assess various ways of presenting the rating scale used
to rate the courses and instructors. The course and instructor
rating information is a fundamental component of the SIMians Course
Comment Forum. We therefore would like to design scales that are
intuitive and that can be quickly understood by the user with a high
degree of accuracy.
We have already
informally tested two different versions of the scale. In our
low-fi prototype and first interactive prototype, we provided a strictly
graphical rating system. This system allowed the user to assign a rating
consisting of "thumbs-up" images or "thumbs-down" images (with three
"thumbs-down" representing the worst possible rating and three "thumbs-up"
representing the best possible rating). The heuristic evaluation
performed by the McInterface group reported that this rating scale was
difficult to understand.
We subsequently
changed the rating scale for the second interactive prototype.
The current system features a strictly numeric scale where the user
selects a number from 1 to 5 (with 1 representing not recommended or
low difficulty, and 5 representing highly recommended or high difficulty).
Although the results of the informal usability tests for the second
prototype suggest that this is a better rating system, it would be useful
to formally test various ways of presenting the rating scale.
Hypotheses:
Hypothesis 1A: A strictly numeric system will
be understood by the user more quickly and more accurately than a strictly
graphic rating system.
The
informal usability test results suggest that the numeric system is superior
to the purely graphical system. This may have been a function
of the "thumbs-up/thumbs-down" metaphor which may not be a familiar
concept to some users. However, even if a more familiar metaphor
were used, such as a scale consisting of a varying number of stars (or
some other icon such as pencils or books), we feel that it would be
easier for users to comprehend the difference in meaning between a higher
number and a lower number than a larger set and a smaller set of image
objects.
Hypothesis 1B: A combination graphical and numeric rating
scale will be understood by the user more quickly and more accurately
than a strictly graphical and a strictly numeric rating system.
We
feel that a rating scale that combines both numbers and images would
be an optimal rating scale since the images could be used to visually
represent the meaning of the numbers, but the numbers would provide
greater clarity.
Hypothesis 2A: A Scale of 1 to 3 will come closer to
capturing the user's "true" rating than a scale of 1 to 10. One
tester of the second interactive prototype explictly stated that a scale
of 1 to 10 would be too large. We think that a user will have
less difficulty choosing a rating reflecting his/her true opinion with
a smaller rating scale, rather than a larger rating scale.
Hypothesis
2B: A scale of 1 to 5 will come closer to capturing the
user's "true" rating than either a scale of 1 to 3 or a scale of 1
to 10.
We
feel that there would be a lower limit to the size of the scale, and
that a scale of 1 to 3 would not provide adequate granularity for the
user to choose an rating reflecting his/her true opinion.
Factors (Independent Variables):
Factor 1-
Presentation of the scale [Between-Subjects]:
-
a strictly numeric scale [P-N]
-
a strictly graphical scale (e.g., images such as stars, etc.) [P-G]
-
a combination numeric and graphical scale (e.g., images combined with
a corresponding number) [P-NG]
Since the heuristic
evaluation showed that the use of thumbs was not an appropriate metaphor
for the scales, we would choose another image, perhaps stars (the most
familiar icon for a rating system) or a thematic icon such as pencils
or books (which would be related to the topic of the website).
Factor 2-
Span of the scale [Within-Subjects]:
- a scale of 1
to 3 [S-3]
- a scale of 1
to 5 [S-5]
- a scale of 1
to 10 [S-10]
Response Variables:
Response
Variable 1- The amount of time it takes for the user to assign
ratings for course difficulty and instructor.
Response
Variable 2- Whether the user chooses not to rate the course difficulty
and/or the instructor.
Response
Variable 3- Whether the user asks for clarification regarding
the rating system during the test.
Response
Variable 4- The user's satisfaction with the scale, as measured
post-test using Likert scales.
Response
Variable 5- The user's opinions regarding whether he/she felt
that the ratings he/she assigned accurately reflected his opinions
of the course difficulty and instructor, as measured post-test using
Likert scales.
Blocking and Repetitions:
Table 1:
Blocking of Experiment by Factor Level
Presentation
of the Scale |
P-N
|
P-G
|
P-NG
|
Span
of the Scale |
S-3
|
S-5
|
S-10
|
S-10
|
S-3
|
S-5
|
S-5
|
S-10
|
S-3
|
Given the nature
of the website, we would prefer to limit the testers to SIMS students.
Optimistically, we could perhaps test approximately half of the student
body or about 36 subjects. This would come out to four repetitions
of each scale presentation and scale span combination. That is,
there would be 12 repetitions for each scale presentation level and
12 repetitions for each scale span level. Realistically, we might
only be able to recruit about a fourth of the SIMS student body which
would (obviously) reduce the number of repetitions per cell in half.