Crowd Access Path Optimization: Diversity Matters

Quality assurance is one the most important challenges in crowdsourcing. Assigning tasks to several workers to increase quality through redundant answers can be expensive if asking homogeneous sources. This limitation has been overlooked by current crowdsourcing platforms resulting therefore in costly solutions. In order to achieve desirable cost-quality tradeoffs it is essential to apply efficient crowd access optimization techniques. Our work argues that optimization needs to be aware of diversity and correlation of information within groups of individuals so that crowdsourcing redundancy can be adequately planned beforehand. Based on this intuitive idea, we introduce the Access Path Model (APM), a novel crowd model that leverages the notion of access paths as an alternative way of retrieving information. APM aggregates answers ensuring high quality and meaningful confidence. Moreover, we devise a greedy optimization algorithm for this model that finds a provably good approximate plan to access the crowd. We evaluate our approach on three crowdsourced datasets that illustrate various aspects of the problem. Our results show that the Access Path Model combined with greedy optimization is cost-efficient and practical to overcome common difficulties in large-scale crowdsourcing like data sparsity and anonymity.

[1]  Scott E. Page,et al.  Optimal Forecasting Groups , 2012, Manag. Sci..

[2]  Carlos Guestrin,et al.  A Note on the Budgeted Maximization of Submodular Functions , 2005 .

[3]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[4]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[7]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[8]  Gabriella Kazai,et al.  The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy , 2012, CIKM.

[9]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[10]  Matthew Lease,et al.  Crowdsourced Task Routing via Matrix Factorization , 2013, ArXiv.

[11]  Luis M. de Campos,et al.  A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests , 2006, J. Mach. Learn. Res..

[12]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[13]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[14]  Stefano Ceri,et al.  Community-based crowdsourcing , 2014, WWW '14 Companion.

[15]  Jennifer Widom,et al.  Optimal Crowd-Powered Rating and Filtering Algorithms , 2014, Proc. VLDB Endow..

[16]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[17]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[18]  Renato Renner,et al.  An intuitive proof of the data processing inequality , 2011, Quantum Inf. Comput..

[19]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[20]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[21]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[22]  Lu Hong,et al.  Groups of diverse problem solvers can outperform groups of high-ability problem solvers. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[24]  Martin Hentschel,et al.  CrowdSTAR: A Social Task Routing Framework for Online Communities , 2014, ICWE.

[25]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[26]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Ting Wu,et al.  Hear the Whole Story: Towards the Diversity of Opinion in Crowdsourcing Markets , 2015, Proc. VLDB Endow..

[28]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[29]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[30]  Garcia-MolinaHector,et al.  Optimal crowd-powered rating and filtering algorithms , 2014, VLDB 2014.

[31]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[32]  Jennifer Widom,et al.  Deco: declarative crowdsourcing , 2012, CIKM.

[33]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[34]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..