A generic statistical approach for spam detection in Online Social Networks

Abstract In this paper, we present a generic statistical approach to identify spam profiles on Online Social Networks (OSNs). Our study is based on real datasets containing both normal and spam profiles crawled from Facebook and Twitter networks. We have identified a set of 14 generic statistical features to identify spam profiles. The identified features are common to both Facebook and Twitter networks. For classification task, we have used three different classification algorithms – na i ve Bayes , Jrip , and J48 , and evaluated them on both individual and combined datasets to establish the discriminative property of the identified features. The results obtained on a combined dataset has detection rate (DR) as 0.957 and false positive rate (FPR) as 0.048, whereas on Facebook dataset the DR and FPR values are 0.964 and 0.089, respectively, and that on Twitter dataset the DR and FPR values are 0.976 and 0.075, respectively. We have also analyzed the contribution of each individual feature towards the detection accuracy of spam profiles. Thereafter, we have considered 7 most discriminative features and proposed a clustering-based approach to identify spam campaigns on Facebook and Twitter networks.

[1]  David M. Nicol,et al.  The Koobface botnet and the rise of social malware , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[2]  Rizal Setya Perdana What is Twitter , 2013 .

[3]  Shouhuai Xu,et al.  Social Network-Based Botnet Command-and-Control: Emerging Threats and Countermeasures , 2010, ACNS.

[4]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[5]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[6]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[7]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[8]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[9]  Muhammad Abulaish,et al.  An MCL-Based Approach for Spam Profile Detection in Online Social Networks , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[10]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[11]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[12]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[13]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[14]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[15]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[16]  Nikita Borisov,et al.  Stegobot: A Covert Social Network Botnet , 2011, Information Hiding.

[17]  Jiebo Luo,et al.  SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks , 2011, Proc. VLDB Endow..

[18]  Ben Y. Zhao,et al.  Understanding latent interactions in online social networks , 2010, IMC '10.

[19]  Kyumin Lee,et al.  Content-driven detection of campaigns in social media , 2011, CIKM '11.

[20]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[21]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[22]  Karen Rose,et al.  What is Twitter , 2009 .

[23]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[24]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[25]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[26]  Van DongenStijn Graph Clustering Via a Discrete Uncoupling Process , 2008 .

[27]  Ben Y. Zhao,et al.  Uncovering social network sybils in the wild , 2011, IMC '11.