Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation

Big data is having a disruptive impact across the sciences. Human annotation of semantic interpretation tasks is a critical part of big data semantics, but it is based on an antiquated ideal of a single correct truth that needs to be similarly disrupted. We expose seven myths about human annotation, most of which derive from that antiquated ideal of truth, and dispell these myths with examples from our research. We propose a new theory of truth, crowd truth, that is based on the intuition that human interpretation is subjective, and that measuring annotations on the same objects of interpretation (in our examples, sentences) across a crowd will provide a useful representation of their subjectivity and the range of reasonable interpretations.

[1]  Yaron Singer,et al.  Pricing mechanisms for crowdsourcing markets , 2013, WWW.

[2]  Lora Aroyo,et al.  Exploiting Crowdsourcing Disagreement with Various Domain-Independent Quality Measures , 2013 .

[3]  Chris Welty,et al.  Detection , Representation , and Exploitation of Events in the Semantic Web , 2012 .

[4]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[5]  Lora Aroyo,et al.  Content and Behaviour Based Metrics for Crowd Truth , 2013, CrowdSem.

[6]  Lora Aroyo,et al.  Measuring Crowd Truth for Medical Relation Extraction , 2013, AAAI Fall Symposia.

[7]  Lora Aroyo,et al.  "Dr. Detective": combining gamification techniques and crowdsourcing to create a gold standard in medical text , 2013, CrowdSem.

[8]  Lora Aroyo,et al.  On the role of user-generated metadata in audio visual collections , 2011, K-CAP '11.

[9]  Xiao Hu,et al.  Generating ground truth for music mood classification using mechanical turk , 2012, JCDL '12.

[10]  Tom M. Mitchell,et al.  Coupling Semi-Supervised Learning of Categories and Relations , 2009, HLT-NAACL 2009.

[11]  Virgílio A. F. Almeida,et al.  Proceedings of the 22nd international conference on World Wide Web , 2013, WWW 2013.

[12]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[13]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[14]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[15]  Alessandro Bozzon,et al.  Reactive crowdsourcing , 2013, WWW.

[16]  Ciro Cattuto,et al.  Evaluating similarity measures for emergent semantics of social tagging , 2009, WWW '09.

[17]  Chris Welty,et al.  Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard , 2013 .

[18]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[19]  C. K. Ogden,et al.  The Meaning of Meaning , 1923 .

[20]  Dirk Hovy,et al.  Learning part-of-speech taggers with inter-annotator agreement loss , 2014, EACL.

[21]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[22]  Gianluca Demartini,et al.  Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.

[23]  K. Fernow New York , 1896, American Potato Journal.

[24]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[25]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[26]  Gabriella Kazai,et al.  Advances in Information Retrieval , 2015, Lecture Notes in Computer Science.

[27]  Rada Mihalcea,et al.  Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation , 2003 .