Detecting Crowdsourcing Spammers in Community Question Answering Websites

The growth of online crowdsourcing marketplaces has attracted massive normal buyers and micro workers, even campaigners and malicious users who post spamming jobs. Due to the significant role in information seeking and providing, CQA (Community Question Answering) has become a target of crowdsourcing spammers. In this paper, we aim to develop a solution to detect crowdsourcing spammers in CQA websites. Based on the ground-truth data, we conduct a hybrid analysis including both non-semantic and semantic analysis with a set of unique features (e.g., profile features, social network features, content features and linguistic features). With the help of proposed features, we develop a supervised machine learning solution for detecting crowdsourcing spammers in Community QA. Our method achieves a high performance with an AUC (area under the receiver-operating characteristic curve) value of 0.995 and an \(F_{1}\) score of 0.967, which significantly outperforms existing works.

[1]  Srinivasan Venkatesh,et al.  The best answers? Think twice: Online detection of commercial campaigns in the CQA forums , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[4]  Gang Wang,et al.  Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers , 2014, USENIX Security Symposium.

[5]  Kyumin Lee,et al.  Crowdturfers, Campaigns, and Social Media: Tracking and Revealing Crowdsourced Manipulation of Social Media , 2013, ICWSM.

[6]  Yiqun Liu,et al.  Detecting Promotion Campaigns in Community Question Answering , 2015, IJCAI.

[7]  Ye Tian,et al.  Revealing, characterizing, and detecting crowdsourcing spammers: A case study in community Q&A , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[8]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[9]  Hui Xiong,et al.  Introduction to special section on intelligent mobile knowledge discovery and management systems , 2013, ACM Trans. Intell. Syst. Technol..

[10]  Kyumin Lee,et al.  Campaign extraction from social media , 2013, ACM Trans. Intell. Syst. Technol..

[11]  Srinivasan Venkatesh,et al.  Battling the Internet water army: Detection of hidden paid posters , 2011, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Kyumin Lee,et al.  Characterizing and automatically detecting crowdturfing in Fiverr and Twitter , 2015, Social Network Analysis and Mining.

[15]  He Li,et al.  Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog , 2013, Brain and Health Informatics.

[16]  Kyumin Lee,et al.  The Dark Side of Micro-Task Marketplaces: Characterizing Fiverr and Automatically Detecting Crowdturfing , 2014, ICWSM.