Human Computation Must Be Reproducible
暂无分享,去创建一个
[1] L. Cronbach. Coefficient alpha and the internal structure of tests , 1951 .
[2] Manuel Blum,et al. reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.
[3] R. Alpert,et al. Communications Through Limited-Response Questioning , 1954 .
[4] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.
[5] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[6] Praveen Paritosh,et al. The anatomy of a large-scale human computation engine , 2010, HCOMP '10.
[7] R. Craggs,et al. A two dimensional annotation scheme for emotion in dialogue , 2004 .
[8] L. Koran,et al. The reliability of clinical methods, data and judgments (first of two parts). , 1975, The New England journal of medicine.
[9] W. A. Scott,et al. Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .
[10] Jacob Cohen,et al. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .
[11] C. Eccleston,et al. Systematic review and meta-analysis of randomized controlled trials of cognitive behaviour therapy and behaviour therapy for chronic pain in adults, excluding headache , 1999, Pain.
[12] Stefanie Nowak,et al. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.
[13] Adrien Treuille,et al. Predicting protein structures with a multiplayer online game , 2010, Nature.
[14] K. Krippendorff. Bivariate Agreement Coefficients for Reliability of Data , 1970 .
[15] C. P. Hughes,et al. A New Clinical Scale for the Staging of Dementia , 1982, British Journal of Psychiatry.
[16] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[17] Siddharth Suri,et al. Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.
[18] Omar Alonso,et al. Crowdsourcing for relevance evaluation , 2008, SIGF.
[19] Klaus Krippendorff,et al. Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .
[20] A. Goodman,et al. The reliability of psychiatric diagnosis in Israel's Psychiatric Case Register , 1984, Acta psychiatrica Scandinavica.
[21] Luis von Ahn. Games with a Purpose , 2006, Computer.
[22] Peng Dai,et al. Decision-Theoretic Control of Crowd-Sourced Workflows , 2010, AAAI.
[23] Lukas Biewald,et al. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.
[24] Roel Popping,et al. On Agreement Indices for Nominal Data , 1988 .
[25] H. Kraemer,et al. 2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.
[26] Colin Seymour-Ure,et al. Content Analysis in Communication Research. , 1972 .
[27] W. Nelson. Statistical Methods for Reliability Data , 1998 .
[28] Ron Artstein,et al. Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.
[29] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.
[30] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[31] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.
[32] L. Koran,et al. The reliability of clinical methods, data and judgments (second of two parts). , 1975, The New England journal of medicine.
[33] T. Marteau,et al. The Place of Inter-Rater Reliability in Qualitative Research: An Empirical Study , 1997 .
[34] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[35] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.
[36] Page Keeley. With a Purpose , 2011 .
[37] Barbara Di Eugenio,et al. Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.
[38] J. Bert Keats,et al. Statistical Methods for Reliability Data , 1999 .
[39] Esser,et al. Alive and Well after 25 Years: A Review of Groupthink Research. , 1998, Organizational behavior and human decision processes.
[40] A. Aboraya,et al. The Reliability of Psychiatric Diagnosis Revisited: The Clinician's Guide to Improve the Reliability of Psychiatric Diagnosis. , 2006, Psychiatry (Edgmont (Pa. : Township)).
[41] A. Feinstein,et al. High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.
[42] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .