论文信息 - Human Computation Must Be Reproducible

Human Computation Must Be Reproducible

Human computation is the technique of performing a computational process by outsourcing some of the difficult-toautomate steps to humans. In the social and behavioral sciences, when using humans as measuring instruments, reproducibility guides the design and evaluation of experiments. We argue that human computation has similar properties, and that the results of human computation must be reproducible, in the least, in order to be informative. We might additionally require the results of human computation to have high validity or high utility, but the results must be reproducible in order to measure the validity or utility to a degree better than chance. Additionally, a focus on reproducibility has implications for design of task and instructions, as well as for the communication of the results. It is humbling how often the initial understanding of the task and guidelines turns out to lack reproducibility. We suggest ensuring, measuring and communicating reproducibility of human computation tasks.

Praveen Paritosh | Praveen K. Paritosh

[1] L. Cronbach. Coefficient alpha and the internal structure of tests , 1951 .

[2] Manuel Blum,et al. reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[3] R. Alpert,et al. Communications Through Limited-Response Questioning , 1954 .

[4] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[5] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6] Praveen Paritosh,et al. The anatomy of a large-scale human computation engine , 2010, HCOMP '10.

[7] R. Craggs,et al. A two dimensional annotation scheme for emotion in dialogue , 2004 .

[8] L. Koran,et al. The reliability of clinical methods, data and judgments (first of two parts). , 1975, The New England journal of medicine.

[9] W. A. Scott,et al. Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[10] Jacob Cohen,et al. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[11] C. Eccleston,et al. Systematic review and meta-analysis of randomized controlled trials of cognitive behaviour therapy and behaviour therapy for chronic pain in adults, excluding headache , 1999, Pain.

[12] Stefanie Nowak,et al. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[13] Adrien Treuille,et al. Predicting protein structures with a multiplayer online game , 2010, Nature.

[14] K. Krippendorff. Bivariate Agreement Coefficients for Reliability of Data , 1970 .

[15] C. P. Hughes,et al. A New Clinical Scale for the Staging of Dementia , 1982, British Journal of Psychiatry.

[16] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[17] Siddharth Suri,et al. Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.