Collusion-Proof Result Inference in Crowdsourcing

In traditional crowdsourcing, workers are expected to provide independent answers to tasks so as to ensure the diversity of answers. However, recent studies show that the crowd is not a collection of independent workers, but instead that workers communicate and collaborate with each other. To pursue more rewards with little effort, some workers may collude to provide repeated answers, which will damage the quality of the aggregated results. Nonetheless, there are few efforts considering the negative impact of collusion on result inference in crowdsourcing. In this paper, we are specially concerned with the Collusion-Proof result inference problem for general crowdsourcing tasks in public platforms. To that end, we design a metric, the worker performance change rate, to identify the colluded answers by computing the difference of the mean worker performance before and after removing the repeated answers. Then we incorporate the collusion detection result into existing result inference methods to guarantee the quality of the aggregated results even with the occurrence of collusion behaviors. With real-world and synthetic datasets, we conducted an extensive set of evaluations of our approach. The experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.

[1]  Lu Wang,et al.  Cost-Saving Effect of Crowdsourcing Learning , 2016, IJCAI.

[2]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[3]  Ming Yin,et al.  The Communication Network Within the Crowd , 2016, WWW.

[4]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  Chao Gao,et al.  Exact Exponent in Optimal Rates for Crowdsourcing , 2016, ICML.

[7]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[8]  Sihem Amer-Yahia,et al.  Task Assignment Optimization in Collaborative Crowdsourcing , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Masood Niazi Torshiz,et al.  Collusion-resistant Worker Selection in Social Crowdsensing Systems , 2017 .

[10]  Hojung Cha,et al.  CoSMiC: designing a mobile crowd-sourced collaborative application to find a missing child in situ , 2014, MobileHCI '14.

[11]  Francisco Vilar Brasileiro,et al.  An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market , 2017, Computer Supported Cooperative Work (CSCW).

[12]  Michael S. Bernstein,et al.  Huddler: Convening Stable and Familiar Crowd Teams Despite Unpredictable Availability , 2016, CSCW.

[13]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[14]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[15]  Hailong Sun,et al.  Improving the Quality of Crowdsourced Image Labeling via Label Similarity , 2017, Journal of Computer Science and Technology.

[16]  Yuanchun Shi,et al.  CEPT: Collaborative Editing Tool for Non-Native Authors , 2017, CSCW.

[17]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[18]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[19]  Jaime G. Carbonell,et al.  Collaborative workflow for crowdsourcing translation , 2012, CSCW.

[20]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[21]  Hailong Sun,et al.  Effective Result Inference for Context-Sensitive Tasks in Crowdsourcing , 2016, DASFAA.

[22]  Mary L. Gray,et al.  The Crowd is a Collaborative Network , 2016, CSCW.

[23]  Omer Lev,et al.  Mergers and collusion in all-pay auctions and crowdsourcing contests , 2013, AAMAS.

[24]  Samantha A. Adams,et al.  Maintaining the collision of accounts: crowdsourcing sites in health care as brokers in the co-production of pharmaceutical knowledge , 2014 .

[25]  Hailong Sun,et al.  A Model for Aggregating Contributions of Synergistic Crowdsourcing Workflows , 2014, AAAI.

[26]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[27]  Aditya G. Parameswaran,et al.  Challenges in Data Crowdsourcing , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[29]  Pengfei Zhang,et al.  Collusion-resistant Spatial Phenomena Crowdsourcing via Mixture of Gaussian Processes Regression , 2016, TRUST@AAMAS.

[30]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2015, Commun. ACM.

[31]  Hailong Sun,et al.  A Decision Tree Based Quality Control Framework for Multi-phase Tasks in Crowdsourcing , 2017, ChineseCSCW.

[32]  Jaime Teevan,et al.  Supporting Collaborative Writing with Microtasks , 2016, CHI.

[33]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[34]  Jaime G. Carbonell,et al.  Detecting Non-Adversarial Collusion in Crowdsourcing , 2014, HCOMP.

[35]  Jaime Teevan,et al.  WearWrite: Crowd-Assisted Writing from Smartwatches , 2016, CHI.

[36]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[37]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[38]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[39]  L. Elisa Celis,et al.  Assignment Techniques for Crowdsourcing Sensitive Tasks , 2016, CSCW.

[40]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[41]  Lei Chen,et al.  Data-driven crowdsourcing: Management, mining, and applications , 2015, 2015 IEEE 31st International Conference on Data Engineering.