CONAN: A framework for detecting and handling collusion in crowdsourcing

Abstract In contrast to the traditional view that individuals should work independently to realize the crowd wisdom, crowdsourcing workers often collaborate with each other in task processing either explicitly or implicitly. Some may even collude for obtaining rewards easily, for example, by plagiarizing others’ answers. Collusion behavior sabotages the independency among workers, and will subvert the benefits of task redundancy that is commonly adopted in crowdsourcing. Therefore, dealing with collusion is critical for ensuring the quality of crowdsourcing. Existing work usually treats all the collusive answers as being harmful, thus simply filters away them once they are detected. However, it is not always the best strategy in practice. In particular, when the collusive answers are plagiarized from a worker with good ability, utilizing them instead of simple elimination can benefit the result quality. In this work, we first propose a collusion-aware framework for detecting and handling collusion in crowdsourcing properly. Second, we design a collusion detection method based on the statistical test of the consistency of workers’ answers across tasks. Third, we provide a theoretical means to determine when collusive answers should be kept and utilized, then we design a collusion-aware answer aggregation method. Finally, we conducted thorough evaluation with both synthetic and real-world datasets, and the results demonstrate the effectiveness of our approach.

[1]  Haipei Sun,et al.  Sensitive Task Assignments in Crowdsourcing Markets with Colluding Workers , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[2]  Hailong Sun,et al.  Context-aware result inference in crowdsourcing , 2018, Inf. Sci..

[3]  Hailong Sun,et al.  Combining Machine Learning and Crowdsourcing for Better Understanding Commodity Reviews , 2015, AAAI.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  L. Elisa Celis,et al.  Assignment Techniques for Crowdsourcing Sensitive Tasks , 2016, CSCW.

[6]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[7]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[8]  Jaime G. Carbonell,et al.  Detecting Non-Adversarial Collusion in Crowdsourcing , 2014, HCOMP.

[9]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[10]  Guoliang Li,et al.  Sybil Defense in Crowdsourcing Platforms , 2017, CIKM.

[11]  Xindong Wu,et al.  Multi-Class Ground Truth Inference in Crowdsourcing with Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[12]  Pengfei Zhang,et al.  A Trust-based Mixture of Gaussian Processes Model for Reliable Regression in Participatory Sensing , 2017, IJCAI.

[13]  Huseyin Polat,et al.  Shilling attacks against recommender systems: a comprehensive survey , 2014, Artificial Intelligence Review.

[14]  Boualem Benatallah,et al.  Harnessing Implicit Teamwork Knowledge to Improve Quality in Crowdsourcing Processes , 2014, 2014 IEEE 7th International Conference on Service-Oriented Computing and Applications.

[15]  Ximeng Liu,et al.  PPTDS: A privacy-preserving truth discovery scheme in crowd sensing systems , 2019, Inf. Sci..

[16]  Hailong Sun,et al.  On the Cost Complexity of Crowdsourcing , 2018, IJCAI.

[17]  Chao Gao,et al.  Exact Exponent in Optimal Rates for Crowdsourcing , 2016, ICML.

[18]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[19]  Gaoxi Xiao,et al.  Iterative expectation maximization for reliable social sensing with information flows , 2019, Inf. Sci..

[20]  Jennifer Wortman Vaughan Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research , 2017, J. Mach. Learn. Res..

[21]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[22]  Mary L. Gray,et al.  The Crowd is a Collaborative Network , 2016, CSCW.

[23]  Omer Lev,et al.  Mergers and collusion in all-pay auctions and crowdsourcing contests , 2013, AAMAS.

[24]  Christopher Archibald,et al.  Automating Collusion Detection in Sequential Games , 2013, AAAI.

[25]  Michael S. Bernstein,et al.  Huddler: Convening Stable and Familiar Crowd Teams Despite Unpredictable Availability , 2016, CSCW.

[26]  Pranjal Awasthi,et al.  Crowdsourcing with Arbitrary Adversaries , 2018, ICML.

[27]  John Riedl,et al.  Shilling recommender systems for fun and profit , 2004, WWW '04.

[28]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[29]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[30]  Huaqun Wang,et al.  Privacy-preserving incentive and rewarding scheme for crowd computing in social media , 2019, Inf. Sci..

[31]  Pengfei Zhang,et al.  Collusion-resistant Spatial Phenomena Crowdsourcing via Mixture of Gaussian Processes Regression , 2016, TRUST@AAMAS.

[32]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[33]  Ming Yin,et al.  The Communication Network Within the Crowd , 2016, WWW.

[34]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[35]  Hailong Sun,et al.  Collusion-Proof Result Inference in Crowdsourcing , 2018, Journal of Computer Science and Technology.

[36]  Sencun Zhu,et al.  GroupTie: toward hidden collusion group discovery in app stores , 2014, WiSec '14.

[37]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[38]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[39]  Shaohan Hu,et al.  On Source Dependency Models for Reliable Social Sensing: Algorithms and Fundamental Error Bounds , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[40]  Lydia B. Chilton,et al.  TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[41]  Gregory Valiant,et al.  Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction , 2016, NIPS.

[42]  Anupama Aggarwal,et al.  Detecting and Mitigating the Effect of Manipulated Reputation on Online Social Networks , 2016, WWW.

[43]  Niki Pissinou,et al.  Approach to detect non-adversarial overlapping collusion in crowdsourcing , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).

[44]  Arjun Mukherjee,et al.  On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp , 2016, WWW.

[45]  V. S. Subrahmanian,et al.  An Army of Me: Sockpuppets in Online Discussion Communities , 2017, WWW.

[46]  Rui Sun,et al.  SRMCS: A semantic-aware recommendation framework for mobile crowd sensing , 2018, Inf. Sci..

[47]  R. Porter,et al.  NONCOOPERATIVE COLLUSION UNDER IMPERFECT PRICE INFORMATION , 1984 .

[48]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.