Incremental Quality Inference in Crowdsourcing

Crowdsourcing has attracted significant attention from the database community in recent years and several crowdsourced databases have been proposed to incorporate human power into traditional database systems. One big issue in crowdsourcing is to achieve high quality because workers may return incorrect answers. A typical solution to address this problem is to assign each question to multiple workers and combine workers’ answers to generate the final result. One big challenge arising in this strategy is to infer worker’s quality. Existing methods usually assume each worker has a fixed quality and compute the quality using qualification tests or historical performance. However these methods cannot accurately estimate a worker’s quality. To address this problem, we propose a worker model and devise an incremental inference strategy to accurately compute the workers’ quality. We also propose a question model and develop two efficient strategies to combine the worker’s model to compute the question’s result. We implement our method and compare with existing inference approaches on real crowdsourcing platforms using real-world datasets, and the experiments indicate that our method achieves high accuracy and outperforms existing approaches.

[1]  Jeff Howe,et al.  Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2008, Human Resource Management International Digest.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[4]  David R. Karger,et al.  Demonstration of Qurk: a query processor for humanoperators , 2011, SIGMOD '11.

[5]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[6]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[7]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[8]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[9]  Kwong-Sak Leung,et al.  A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[10]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[11]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[12]  Tim Kraska,et al.  CrowdDB: Query Processing with the VLDB Crowd , 2011, Proc. VLDB Endow..

[13]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[14]  Anja Gruenheid,et al.  Crowdsourcing Entity Resolution: When is A=B? , 2012 .

[15]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[16]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.