Overview of the crowdsourcing process

A decade ago, the crowdsourcing term was first coined and used to represent a method for expressing the wisdom of the crowd in accomplishing two types of tasks. One type includes tasks that need human intelligence rather than machines, and the other type covers those tasks that can be accomplished with a higher time and cost efficiency using the crowd rather than employing experts. The crowdsourcing process contains five modules: The first is designing incentives to mobilize the crowd to do the required task. This step is followed by four modules for collecting and assuring quality and then verifying and aggregating the received information. The verification and quality control can be done for the tasks, collected data and the participants by having more participants answer the same question or accepting answers only from experts to avoid errors from unreliable participants. Methods of discovering topic experts are utilized to discover reliable candidates in the crowd who have relevant experience in the discussed topic. Expert discovery reduces the number of needed participants per question which reduces the overall cost. This work summarizes and reviews the methods used to accomplish each processing step. Yet, choosing a specific method remains application dependent.

[1]  I. Rahwan,et al.  Verification in Referral-Based Crowdsourcing , 2012, PloS one.

[2]  Sam Meek,et al.  A flexible framework for assessing the quality of crowdsourced data , 2014 .

[3]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[4]  Krishna P. Gummadi,et al.  Cognos: crowdsourcing search for topic experts in microblogs , 2012, SIGIR '12.

[5]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[6]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[7]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[8]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[9]  Jing Wang,et al.  Cost-Effective Quality Assurance in Crowd Labeling , 2016, Inf. Syst. Res..

[10]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[11]  Naren Ramakrishnan,et al.  Gaining Insights into Epidemics by Mining Historical Newspapers , 2013, Computer.

[12]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[13]  Shourya Roy,et al.  Beyond Independent Agreement: A Tournament Selection Approach for Quality Assurance of Human Computation Tasks , 2011, Human Computation.

[14]  Alex Leavitt,et al.  The Influentials : New Approaches for Analyzing Influence on Twitter , 2009 .

[15]  Gabriella Kazai,et al.  An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.

[16]  Fang Wu,et al.  Crowdsourcing, attention and productivity , 2008, J. Inf. Sci..

[17]  Xuemin Shen,et al.  SACRM: Social Aware Crowdsourcing with Reputation Management in mobile sensing , 2014, Comput. Commun..

[18]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[19]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[20]  Wai-Tat Fu,et al.  Enhancing reliability using peer consistency evaluation in human computation , 2013, CSCW '13.

[21]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[22]  Maribel Acosta,et al.  Crowdsourcing Linked Data Quality Assessment , 2013, SEMWEB.

[23]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[24]  Alex Pentland,et al.  Time-Critical Social Mobilization , 2010, Science.

[25]  Kyumin Lee,et al.  Characterizing and automatically detecting crowdturfing in Fiverr and Twitter , 2015, Social Network Analysis and Mining.

[26]  Gang Wang,et al.  Follow the green: growth and dynamics in twitter follower markets , 2013, Internet Measurement Conference.

[27]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[28]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[29]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.

[30]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[31]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[32]  Huan Liu,et al.  Detecting Crowdturfing in Social Media , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[33]  Kyumin Lee,et al.  Crowdturfers, Campaigns, and Social Media: Tracking and Revealing Crowdsourced Manipulation of Social Media , 2013, ICWSM.

[34]  Guoliang Li,et al.  Crowdsourced Top-k Algorithms: An Experimental Evaluation , 2016, Proc. VLDB Endow..

[35]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Luis von Ahn Games with a Purpose , 2006, Computer.

[37]  Manuel Cebrián,et al.  Finding red balloons with split contracts: robustness to individuals' selfishness , 2012, STOC '12.

[38]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[39]  Damon Horowitz,et al.  The anatomy of a large-scale social search engine , 2010, WWW '10.

[40]  Eric Horvitz,et al.  Task routing for prediction tasks , 2012, AAMAS.

[41]  Nicholas R. Jennings,et al.  Global Manhunt Pushes the Limits of Social Mobilization , 2013, Computer.

[42]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[43]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[44]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[45]  Stefan Savage,et al.  Dirty Jobs: The Role of Freelance Labor in Web Service Abuse , 2011, USENIX Security Symposium.

[46]  Julien Bourdaillet,et al.  Crowdsourcing Translation by Leveraging Tournament Selection and Lattice-Based String Alignment , 2013, HCOMP.