Key Research Issues and Related Technologies in Crowdsourcing Data Collection

Crowdsourcing provides a distributed method to solve the tasks that are difficult to complete using computers and require the wisdom of human beings. Due to its fast and inexpensive nature, crowdsourcing is widely used to collect metadata and data annotation in many fields, such as information retrieval, machine learning, recommendation system, and natural language processing. Crowdsourcing helps enable the collection of rich and large-scale data, which promotes the development of researches driven by data. In recent years, a large amount of effort has been spent on crowdsourcing in data collection, to address the challenges, including quality control, cost control, efficiency, and privacy protection. In this paper, we introduce the concept and workflow of crowdsourcing data collection. Furthermore, we review the key research topics and related technologies in its workflow, including task design, task-worker matching, response aggregation, incentive mechanism, and privacy protection. Then, the limitations of the existing work are discussed, and the future development directions are identified.

[1]  Yang Xiang,et al.  Privacy-preserving and verifiable online crowdsourcing with worker updates , 2021, Inf. Sci..

[2]  Xiaofan Jia,et al.  A Dual Privacy Preserving Algorithm in Spatial Crowdsourcing , 2020, Mob. Inf. Syst..

[3]  Song Han,et al.  Location Privacy-Preserving Distance Computation for Spatial Crowdsourcing , 2020, IEEE Internet of Things Journal.

[4]  Huichuan Xia,et al.  Privacy in Crowdsourcing: a Review of the Threats and Challenges , 2020, Computer Supported Cooperative Work (CSCW).

[5]  Tianqing Zhu,et al.  Optimizing rewards allocation for privacy-preserving spatial crowdsourcing , 2019, Comput. Commun..

[6]  Yuzhong Qu,et al.  Modeling Topic-Based Human Expertise for Crowd Entity Resolution , 2018, Journal of Computer Science and Technology.

[7]  Silvana Castano,et al.  Crowdsourcing Task Assignment with Online Profile Learning , 2018, OTM Conferences.

[8]  Thomas Gillier,et al.  The effects of task instructions in crowdsourcing innovative ideas , 2018, Technological Forecasting and Social Change.

[9]  Hengrun Zhang,et al.  A Survey on Security, Privacy, and Trust in Mobile Crowdsourcing , 2018, IEEE Internet of Things Journal.

[10]  Fakhri Karray,et al.  Overview of the crowdsourcing process , 2018, Knowledge and Information Systems.

[11]  Robert H. Deng,et al.  Anonymous Privacy-Preserving Task Matching in Crowdsourcing , 2018, IEEE Internet of Things Journal.

[12]  Boualem Benatallah,et al.  Quality Control in Crowdsourcing , 2018, ACM Comput. Surv..

[13]  John C. S. Lui,et al.  Incentive Mechanism and Rating System Design for Crowdsourcing Systems: Analysis, Tradeoffs and Inference , 2018, IEEE Transactions on Services Computing.

[14]  Sihem Amer-Yahia,et al.  Personalized and Diverse Task Composition in Crowdsourcing , 2018, IEEE Transactions on Knowledge and Data Engineering.

[15]  Xiangliang Zhang,et al.  Efficient task assignment in spatial crowdsourcing with worker and task privacy protection , 2018, GeoInformatica.

[16]  Alexander J. Quinn,et al.  Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk , 2017, HCOMP.

[17]  Shao-Yuan Li,et al.  Obtaining High-Quality Label by Distinguishing between Easy and Hard Items in Crowdsourcing , 2017, IJCAI.

[18]  Alessandro Bozzon,et al.  Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing , 2017, HT.

[19]  Minyi Guo,et al.  MELODY: A Long-Term Dynamic Quality-Aware Incentive Mechanism for Crowdsourcing , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[20]  Enno Siemsen,et al.  Running Behavioral Operations Experiments Using Amazon's Mechanical Turk , 2017 .

[21]  Minming Li,et al.  Incentive Mechanism Design to Meet Task Criteria in Crowdsourcing: How to Determine Your Budget , 2017, IEEE Journal on Selected Areas in Communications.

[22]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[23]  Dunren Che,et al.  Real-time recommendation algorithms for crowdsourcing systems , 2017 .

[24]  Jianwei Chen,et al.  Private data aggregation with integrity assurance and fault tolerance for mobile crowd-sensing , 2017, Wirel. Networks.

[25]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[26]  Chenyu Wang,et al.  Stackelberg Game Based Tasks Assignment Mechanism Using Reputation in Crowdsourcing , 2016, 2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI).

[27]  Sihem Amer-Yahia,et al.  A Survey of General-Purpose Crowdsourcing Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Klara Nahrstedt,et al.  Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[29]  David Gross-Amblard,et al.  Using Hierarchical Skills for Optimized Task Assignment in Knowledge-Intensive Crowdsourcing , 2016, WWW.

[30]  Ming Li,et al.  Privacy-preserving verifiable data aggregation and analysis for cloud-assisted mobile crowdsourcing , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[31]  David J. Hauser,et al.  Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants , 2015, Behavior Research Methods.

[32]  Hwee Pink Tan,et al.  Incentive Mechanism Design for Crowdsourcing , 2016, ACM Trans. Intell. Syst. Technol..

[33]  Fenglong Ma,et al.  Crowdsourcing High Quality Labels with a Tight Budget , 2016, WSDM.

[34]  Juho Hamari,et al.  Gamification in Crowdsourcing: A Review , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[35]  Nihar B. Shah,et al.  Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..

[36]  Sheng Zhong,et al.  Designing Secure and Dependable Mobile Sensing Mechanisms With Revenue Guarantees , 2016, IEEE Transactions on Information Forensics and Security.

[37]  Yuguang Fang,et al.  Optimal Task Recommendation for Mobile Crowdsourcing With Privacy Control , 2016, IEEE Internet of Things Journal.

[38]  Zhi-Hua Zhou,et al.  Crowdsourcing label quality: a theoretical analysis , 2015, Science China Information Sciences.

[39]  Xiaohua Tian,et al.  Quality-Driven Auction-Based Incentive Mechanism for Mobile Crowd Sensing , 2015, IEEE Transactions on Vehicular Technology.

[40]  Xiaoying Gan,et al.  Incentivize crowd labeling under budget constraint , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[41]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[42]  Sihem Amer-Yahia,et al.  Task assignment optimization in knowledge-intensive crowdsourcing , 2015, The VLDB Journal.

[43]  Donald F. Towsley,et al.  Incentive and reputation mechanisms for online crowdsourcing systems , 2015, 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS).

[44]  Beng Chin Ooi,et al.  iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[45]  Lu Li,et al.  Towards Preserving Worker Location Privacy in Spatial Crowdsourcing , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[46]  Andreas Peter,et al.  Privacy-Enhanced Participatory Sensing with Collusion Resistance and Data Aggregation , 2014, CANS.

[47]  Huadong Ma,et al.  Privacy-preserving verifiable incentive mechanism for online crowdsourcing markets , 2014, 2014 23rd International Conference on Computer Communication and Networks (ICCCN).

[48]  Martin Schader,et al.  Personalized task recommendation in crowdsourcing information systems - Current state of the art , 2014, Decis. Support Syst..

[49]  Xiang-Yang Li,et al.  How to crowdsource tasks truthfully without sacrificing utility: Online incentive mechanisms with budget constraint , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[50]  Qinghua Li,et al.  Providing Efficient Privacy-Aware Incentives for Mobile Sensing , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[51]  Cyrus Shahabi,et al.  A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing , 2014, Proc. VLDB Endow..

[52]  Jesse Chandler,et al.  Risks and Rewards of Crowdsourcing Marketplaces , 2014, Handbook of Human Computation.

[53]  A. Acquisti,et al.  Reputation as a sufficient condition for data quality on Amazon Mechanical Turk , 2013, Behavior Research Methods.

[54]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.

[55]  Jeffrey V. Nickerson,et al.  Crowdsourced Idea Generation: The Effect of Exposure to an Original Idea , 2013, AMCIS.

[56]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[57]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[58]  Yaron Singer,et al.  Pricing mechanisms for crowdsourcing markets , 2013, WWW.

[59]  M. Six Silberman,et al.  Turkopticon: interrupting worker invisibility in amazon mechanical turk , 2013, CHI.

[60]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[61]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[62]  Gabriella Kazai,et al.  An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.

[63]  Kwong-Sak Leung,et al.  TaskRec: Probabilistic Matrix Factorization in Task Recommendation in Crowdsourcing Systems , 2012, ICONIP.

[64]  Björn Hartmann,et al.  MobileWorks: Designing for Quality in a Managed Crowdsourcing Architecture , 2012, IEEE Internet Computing.

[65]  Xi Fang,et al.  Crowdsourcing to smartphones: incentive mechanism design for mobile phone sensing , 2012, Mobicom '12.

[66]  Ioannis Krontiris,et al.  Monetary incentives in participatory sensing using multi-attributive auctions , 2012, Int. J. Parallel Emergent Distributed Syst..

[67]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[68]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[69]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[70]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[71]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[72]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[73]  Faiza Khan Khattak Quality Control of Crowd Labeling through Expert Evaluation , 2011 .

[74]  Daniel J. Veit,et al.  More than fun and money. Worker Motivation in Crowdsourcing - A Study on Mechanical Turk , 2011, AMCIS.

[75]  Jaime G. Carbonell,et al.  Towards Task Recommendation in Micro-Task Markets , 2011, Human Computation.

[76]  Baik Hoh,et al.  Dynamic pricing incentive for participatory sensing , 2010, Pervasive Mob. Comput..

[77]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[78]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[79]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[80]  Rada Mihalcea,et al.  Amazon Mechanical Turk for Subjectivity Word Sense Disambiguation , 2010, Mturk@HLT-NAACL.

[81]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[82]  M. Sahlins The Conflicts of the Faculty , 2009, Critical Inquiry.

[83]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[84]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[85]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[86]  Luis von Ahn Games with a Purpose , 2006, Computer.

[87]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .