论文信息 - Characterizing and automatically detecting crowdturfing in Fiverr and Twitter - 字舞流文

Characterizing and automatically detecting crowdturfing in Fiverr and Twitter

As human computation on crowdsourcing systems has become popular and powerful for performing tasks, malicious users have started misusing these systems by posting malicious tasks, propagating manipulated contents, and targeting popular web services such as online social networks and search engines. Recently, these malicious users moved to Fiverr, a fast growing micro-task marketplace, where workers can post crowdturfing tasks (i.e., astroturfing campaigns run by crowd workers) and malicious customers can purchase those tasks for only $5. In this manuscript, we present a comprehensive analysis of crowdturfing in Fiverr and Twitter and develop predictive models to detect and prevent crowdturfing tasks in Fiverr and malicious crowd workers in Twitter. First, we identify the most popular types of crowdturfing tasks found in Fiverr and conduct case studies for these crowdturfing tasks. Second, we build crowdturfing task detection classifiers to filter these tasks and prevent them from becoming active in the marketplace. Our experimental results show that the proposed classification approach effectively detects crowdturfing tasks, achieving 97.35 % accuracy. Third, we analyze the real-world impact of crowdturfing tasks by purchasing active Fiverr tasks and quantifying their impact on a target site (Twitter). As part of this analysis, we show that current security systems inadequately detect crowdsourced manipulation, which confirms the necessity of our proposed crowdturfing task detection approach. Finally, we analyze the characteristics of paid Twitter workers, find distinguishing features between these workers and legitimate Twitter accounts, and use these features to build classifiers that detect Twitter workers. Our experimental results show that our classifiers are able to detect Twitter workers effectively, achieving 99.29 % accuracy.

Kyumin Lee | Steve Webb | Hancheng Ge | Kyumin Lee | Steve Webb | Hancheng Ge

[1] Michael S. Bernstein,et al. Soylent: a word processor with a crowd inside , 2010, UIST.

[2] Srinivasan Venkatesh,et al. Battling the Internet water army: Detection of hidden paid posters , 2011, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[3] Guofei Gu,et al. Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[4] Bill Tomlinson,et al. Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[5] Kyumin Lee,et al. The Dark Side of Micro-Task Marketplaces: Characterizing Fiverr and Automatically Detecting Crowdturfing , 2014, ICWSM.

[6] Hector Garcia-Molina,et al. Quality control for comparison microtasks , 2012, CrowdKDD '12.

[7] Ian H. Witten,et al. Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[8] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9] Загоровская Ольга Владимировна,et al. Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[10] Gang Wang,et al. Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[11] Vern Paxson,et al. Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse , 2013, USENIX Security Symposium.

[12] Stefan Savage,et al. Dirty Jobs: The Role of Freelance Labor in Web Service Abuse , 2011, USENIX Security Symposium.

[13] Tim Kraska,et al. CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[14] Kyumin Lee,et al. Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[15] Gang Wang,et al. Social Turing Tests: Crowdsourcing Sybil Detection , 2012, NDSS.

[16] D. Funder,et al. Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. , 2008, Journal of personality and social psychology.

[17] Hector Garcia-Molina,et al. Turkalytics: analytics for human computation , 2011, WWW.

[18] Hisashi Kashima,et al. Leveraging non-expert crowdsourcing workers for improper task detection in crowdsourcing marketplaces , 2014, Expert Syst. Appl..

[19] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[20] Roi Blanco,et al. Machine-Learning for Spammer Detection in Crowd-Sourcing , 2012, HCOMP@AAAI.

[21] Gang Wang,et al. Follow the green: growth and dynamics in twitter follower markets , 2013, Internet Measurement Conference.

[22] Jon Oberlander,et al. What Are They Blogging About? Personality, Topic and Motivation in Blogs , 2009, ICWSM.

[23] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[24] Kyumin Lee,et al. Crowdturfers, Campaigns, and Social Media: Tracking and Revealing Crowdsourced Manipulation of Social Media , 2013, ICWSM.

[25] Elisa Bertino,et al. Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.