Acquiring High Quality Non-Expert Knowledge from On-Demand Workforce

Being expensive and time consuming, human knowledge acquisition has consistently been a major bottleneck for solving real problems. In this paper, we present a practical framework for acquiring high quality non-expert knowledge from on-demand workforce using Amazon Mechanical Turk (MTurk). We show how to apply this framework to collect large-scale human knowledge on AOL query classification in a fast and efficient fashion. Based on extensive experiments and analysis, we demonstrate how to detect low-quality labels from massive data sets and their impact on collecting high-quality knowledge. Our experimental findings also provide insight into the best practices on balancing cost and data quality for using MTurk.

[1]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[4]  Jonathan Pool,et al.  Disambiguating for the web: a test of two methods , 2007, K-CAP '07.

[5]  Michael Kaisser,et al.  Creating a Research Collection of Question Answer Sentence Pairs with Amazon's Mechanical Turk , 2008, LREC.

[6]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[7]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[8]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[9]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[10]  Timothy Chklovski,et al.  Learner: a system for acquiring commonsense knowledge by analogy , 2003, K-CAP '03.

[11]  David G. Stork,et al.  Building intelligent systems one e-citizen at a time , 1999, IEEE Intell. Syst..

[12]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[13]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Qi Su,et al.  Internet-scale collection of human-reviewed data , 2007, WWW '07.

[16]  Ellen M. Voorhees,et al.  Overview of TREC 2003. , 2003 .

[17]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[18]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[19]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[20]  Jon Curtis,et al.  Representing Knowledge Gaps Effectively , 2004, PAKM.