We report results from a human computation study that tests the extent to which output agreement games are better than traditional methods in terms of increasing quality of labels and motivation of voluntary workers on a task with a gold standard. We built an output agreement game that let workers recruited from Amazon’s Mechanical Turks label the semantic textual similarity of 20 sentence pairs. To compare and test the effects of the major components of the game, we created interfaces that had different combinations of a gaming environment (G), social interaction (S), and feedback (F). Our results show that the main reason that an output agreement game can collect more high-quality labels is the gaming environment (scoring system, leaderboard, etc). On the other hand, a worker is much more motivated to voluntarily do the task if he or she can do it with another worker (i.e., with social interaction). Our analysis provides human computation researchers important insight on understanding how and why the method of Game with a Purpose (GWAP) can generate high-quality outcomes and motivate more voluntary
[1]
Luis von Ahn,et al.
Word sense disambiguation via human computation
,
2010,
HCOMP '10.
[2]
Brendan T. O'Connor,et al.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
,
2008,
EMNLP.
[3]
Stephen E. Robertson,et al.
Rethinking the ESP game
,
2009,
CHI Extended Abstracts.
[4]
Laura A. Dabbish,et al.
Designing games with a purpose
,
2008,
CACM.
[5]
Laura A. Dabbish,et al.
Labeling images with a computer game
,
2004,
AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.