[an error occurred while processing this directive]


Formal Experiment Design

[an error occurred while processing this directive]

This experiment is designed to assess various ways of presenting the rating scale used to rate the courses and instructors.  The course and instructor rating information is a fundamental component of the SIMians Course Comment Forum.  We therefore would like to design scales that are intuitive and that can be quickly understood by the user with a high degree of accuracy.

We have already informally tested two different versions of the scale.  In our low-fi prototype and first interactive prototype, we provided a strictly graphical rating system. This system allowed the user to assign a rating consisting of "thumbs-up" images or "thumbs-down" images (with three "thumbs-down" representing the worst possible rating and three "thumbs-up" representing the best possible rating).  The heuristic evaluation performed by the McInterface group reported that this rating scale was difficult to understand. 

We subsequently changed the rating scale for the second interactive prototype.  The current system features a strictly numeric scale where the user selects a number from 1 to 5 (with 1 representing not recommended or low difficulty, and 5 representing highly recommended or high difficulty).  Although the results of the informal usability tests for the second prototype suggest that this is a better rating system, it would be useful to formally test various ways of presenting the rating scale.
 
 

Hypotheses:

Hypothesis 1A:  A strictly numeric system will be understood by the user more quickly and more accurately than a strictly graphic rating system.
The informal usability test results suggest that the numeric system is superior to the purely graphical system.  This may have been a function of the "thumbs-up/thumbs-down" metaphor which may not be a familiar concept to some users.  However, even if a more familiar metaphor were used, such as a scale consisting of a varying number of stars (or some other icon such as pencils or books), we feel that it would be easier for users to comprehend the difference in meaning between a higher number and a lower number than a larger set and a smaller set of image objects.
 
Hypothesis 1B:  A combination graphical and numeric rating scale will be understood by the user more quickly and more accurately than a strictly graphical and a strictly numeric rating system.
We feel that a rating scale that combines both numbers and images would be an optimal rating scale since the images could be used to visually represent the meaning of the numbers, but the numbers would provide greater clarity.
 
Hypothesis 2A:  A Scale of 1 to 3 will come closer to capturing the user's "true" rating than a scale of 1 to 10. One tester of the second interactive prototype explictly stated that a scale of 1 to 10 would be too large.  We think that a user will have less difficulty choosing a rating reflecting his/her true opinion with a smaller rating scale, rather than a larger rating scale.

Hypothesis 2B:  A scale of 1 to 5 will come closer to capturing the user's "true" rating than either a scale of 1 to 3 or a scale of 1 to 10.

We feel that there would be a lower limit to the size of the scale, and that a scale of 1 to 3 would not provide adequate granularity for the user to choose an rating reflecting his/her true opinion.
 
 
Factors (Independent Variables):

Factor 1- Presentation of the scale [Between-Subjects]:

  • a strictly numeric scale [P-N]
  • a strictly graphical scale (e.g., images such as stars, etc.) [P-G]
  • a combination numeric and graphical scale (e.g., images combined with a corresponding number) [P-NG]
Since the heuristic evaluation showed that the use of thumbs was not an appropriate metaphor for the scales, we would choose another image, perhaps stars (the most familiar icon for a rating system) or a thematic icon such as pencils or books (which would be related to the topic of the website).

Factor 2- Span of the scale [Within-Subjects]:

  • a scale of 1 to 3    [S-3]
  • a scale of 1 to 5    [S-5]
  • a scale of 1 to 10  [S-10]
 
Response Variables:

Response Variable 1- The amount of time it takes for the user to assign ratings for course difficulty and instructor.

Response Variable 2- Whether the user chooses not to rate the course difficulty and/or the instructor.

Response Variable 3- Whether the user asks for clarification regarding the rating system during the test.

Response Variable 4- The user's satisfaction with the scale, as measured post-test using Likert scales.

Response Variable 5- The user's opinions regarding whether he/she felt that the ratings he/she assigned accurately reflected his opinions of the course difficulty and instructor, as measured post-test using Likert scales.


Blocking and Repetitions:

Table 1:  Blocking of Experiment by Factor Level
Presentation of the Scale
P-N
P-G
P-NG
Span of the Scale
S-3
S-5
S-10
S-10
S-3
S-5
S-5
S-10
S-3

Given the nature of the website, we would prefer to limit the testers to SIMS students.  Optimistically, we could perhaps test approximately half of the student body or about 36 subjects.  This would come out to four repetitions of each scale presentation and scale span combination.  That is, there would be 12 repetitions for each scale presentation level and 12 repetitions for each scale span level.  Realistically, we might only be able to recruit about a fourth of the SIMS student body which would (obviously) reduce the number of repetitions per cell in half.


Last Modified: Apr-23-2001

Copyright 2001: Linda Duffy, Jean-Anne Fitzpatrick, Sonia Klemperer-Johnson, James Reffell