SLADE: A Smart Large-Scale Task Decomposer in Crowdsourcing

Crowdsourcing has been shown to be effective in a wide range of applications, and is seeing increasing use. A large-scale crowdsourcing task often consists of thousands or millions of atomic tasks, each of which is usually a simple task such as binary choice or simple voting. To distribute a large-scale crowdsourcing task to limited crowd workers, a common practice is to pack a set of atomic tasks into a task bin and send to a crowd worker in a batch. It is challenging to decompose a large-scale crowdsourcing task and execute batches of atomic tasks, which ensures reliable answers at a minimal total cost. Large batches lead to unreliable answers of atomic tasks, while small batches incur unnecessary cost. In this paper, we investigate a general crowdsourcing task decomposition problem, called the Smart Large-scAle task DE composer (SLADE) problem, which aims to decompose a large-scale crowdsourcing task to achieve the desired reliability at a minimal cost. We prove the NP-hardness of the SLADE problem and propose solutions in both homogeneous and heterogeneous scenarios. For the homogeneous SLADE problem, where all the atomic tasks share the same reliability requirement, we propose a greedy heuristic algorithm and an efficient and effective approximation framework using an optimal priority queue (OPQ) structure with provable approximation ratio. For the heterogeneous SLADE problem, where the atomic tasks can have different reliability requirements, we extend the OPQ-based framework leveraging a partition strategy, and also prove its approximation guarantee. Finally, we verify the effectiveness and efficiency of the proposed solutions through extensive experiments on representative crowdsourcing platforms.

[1]  Jian Li,et al.  Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach , 2016, SIGMOD Conference.

[2]  David R. Karger,et al.  Demonstration of Qurk: a query processor for humanoperators , 2011, SIGMOD '11.

[3]  Jennifer Widom,et al.  Optimal Crowd-Powered Rating and Filtering Algorithms , 2014, Proc. VLDB Endow..

[4]  Björn Hartmann,et al.  What's the Right Price? Pricing Tasks for Finishing on Time , 2011, Human Computation.

[5]  Lei Chen,et al.  Online mobile Micro-Task Allocation in spatial crowdsourcing , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[6]  Lei Chen,et al.  CrowdCleaner: Data cleaning for multi-version data on the web via crowdsourcing , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[8]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[9]  Guoliang Li,et al.  Crowdsourced Data Management: Overview and Challenges , 2017, SIGMOD Conference.

[10]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[11]  Sihem Amer-Yahia,et al.  A Survey of General-Purpose Crowdsourcing Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[12]  Gang Chen,et al.  An online cost sensitive decision-making method in crowdsourcing systems , 2013, SIGMOD '13.

[13]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[14]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[15]  Lei Chen,et al.  WiseMarket: a new paradigm for managing wisdom of online social users , 2013, KDD.

[16]  Reynold Cheng,et al.  DOCS: a domain-aware crowdsourcing system using knowledge bases , 2016, VLDB 2016.

[17]  Zhifeng Bao,et al.  Crowdsourced POI labelling: Location-aware result inference and Task Assignment , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[18]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[19]  Jennifer Widom,et al.  Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace , 2017, Proc. VLDB Endow..

[20]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[21]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[22]  Jennifer Widom,et al.  Query Optimization over Crowdsourced Data , 2013, Proc. VLDB Endow..

[23]  Jian Li,et al.  CDB: Optimizing Queries with Crowd-Based Selections and Joins , 2017, SIGMOD Conference.

[24]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[25]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[26]  Lei Chen,et al.  Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis , 2016, Proc. VLDB Endow..

[27]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[28]  Lei Chen,et al.  Reducing Uncertainty of Schema Matching via Crowdsourcing , 2013, Proc. VLDB Endow..

[29]  Lei Chen,et al.  Spatial Crowdsourcing: Challenges, Techniques, and Applications , 2017, Proc. VLDB Endow..

[30]  Matti Pietikäinen,et al.  Recognising spontaneous facial micro-expressions , 2011, 2011 International Conference on Computer Vision.

[31]  Aditya G. Parameswaran,et al.  So who won?: dynamic max discovery with the crowd , 2012, SIGMOD Conference.

[32]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[33]  Ted S. Sindlinger,et al.  Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .

[34]  Lei Chen,et al.  Trichromatic Online Matching in Real-Time Spatial Crowdsourcing , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[35]  Guoliang Li,et al.  DOCS: Domain-Aware Crowdsourcing System , 2016, Proc. VLDB Endow..

[36]  Brian P. Bailey,et al.  If not now, when?: the effects of interruption at different moments within task execution , 2004, CHI.

[37]  Sanjeev Khanna,et al.  Using the crowd for top-k and group-by queries , 2013, ICDT '13.

[38]  Aditya G. Parameswaran,et al.  Challenges in Data Crowdsourcing , 2016, IEEE Transactions on Knowledge and Data Engineering.

[39]  David R. Karger,et al.  Counting with the Crowd , 2012, Proc. VLDB Endow..

[40]  Pierre Senellart,et al.  CrowdMiner: Mining association rules from the crowd , 2013, Proc. VLDB Endow..

[41]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[42]  Pierre Senellart,et al.  Crowd mining , 2013, SIGMOD '13.

[43]  Purnamrita Sarkar,et al.  Crowdsourced enumeration queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[44]  Beng Chin Ooi,et al.  iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[45]  Reynold Cheng,et al.  On Optimality of Jury Selection in Crowdsourcing , 2015, EDBT.

[46]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[47]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[48]  Chien-Ju Ho,et al.  Online Task Assignment in Crowdsourcing Markets , 2012, AAAI.