Test Measures

In conducting our pilot usability study, we aimed at (1) looking at how easy (or difficult) it was for the users to complete the annotation tasks (Tasks 1–5), and (2) which collaborative options (annotation history, discussion, or voting results) played a role in the credibility assessment of the current annotation (Task 6).

We should note that for an avid user of the IMG system (User A), tasks 1-5 were quite simple, but the newcomers (the professional annotators) spent significantly more time in getting familiar with the system. We regret that we were not able to better approximate target users in the course of this study, yet we were able to conduct the testing with users who were familiar with the annotation process. We timed the users for reference purposes but mainly relied on qualitative measures such as observing user behavior and conducting follow-up interviews. More specifically, very similar to our earlier lo-fi prototype testing, we studied

In conducting the pilot study we used a "hypothetical gene" to control for the differences in user background and experience. We also started each user with the same exact state of the system to ensure the similarity of the testing conditions.