Test Measures

In conducting our pilot usability study, we aimed at (1) looking at how easy (or difficult) it was for the users to complete the annotation tasks (Tasks 1–5), and (2) which collaborative options (annotation history, discussion, or voting results) played a role in the credibility assessment of the current annotation (Task 6).

We should note that for an avid user of the IMG system (User A), tasks 1-5 were quite simple, but the newcomers (the professional annotators) spent significantly more time in getting familiar with the system. We regret that we were not able to better approximate target users in the course of this study, yet we were able to conduct the testing with users who were familiar with the annotation process. We timed the users for reference purposes but mainly relied on qualitative measures such as observing user behavior and conducting follow-up interviews. More specifically, very similar to our earlier lo-fi prototype testing, we studied

the flow for accessing the system from different entry points;
whether the flow was natural to the users;
whether the information they needed for making an annotation was readily available to them;
whether the interactive "compare and transfer " tool was helpful and/or intuitive;
how they would go about expressing a negative opinion about an existing annotation and whether they would readily vote;
how easily they would make "batch" discussion comments for a list of genes;
what they needed for making an assessment of annotation credibility;
what role the history of the annotation, the discussion, or the voting results played in that assessment.

In conducting the pilot study we used a "hypothetical gene" to control for the differences in user background and experience. We also started each user with the same exact state of the system to ensure the similarity of the testing conditions.