Thwarting Spam on Facebook: Identifying Spam Posts Using Machine Learning Techniques

Spam on the online social networks (OSNs) is evolving as a prominent problem for the users of these networks. Spammers often use certain techniques to deceive the OSN users for their own benefit. Facebook, one of the leading OSNs, is experiencing such crucial problems at an alarming rate. This chapter presents a methodology to segregate spam from legitimate posts using machine learning techniques: naïve Bayes (NB), support vector machine (SVM), and random forest (RF). The textual, image, and video features are used together, which wasn’t considered by the earlier researchers. Then, 1.5 million posts and comments are extracted from archival and real-time Facebook data, which is then pre-processed using RStudio. A total of 30 features are identified, out of which 10 are the best informative for identification of spam vs. ham posts. The entire dataset is shuffled and divided into three ratios, out of which 80:20 ratio of training and testing dataset provides the best result. Also, RF classifier outperforms NB and SVM by achieving overall F-measure 89.4% on the combined feature set.

[1]  Bernard P. Veldkamp,et al.  Predicting self-monitoring skills using textual posts on Facebook , 2014, Comput. Hum. Behav..

[2]  Gianluca Stringhini,et al.  Towards Detecting Compromised Accounts on Social Networks , 2015, IEEE Transactions on Dependable and Secure Computing.

[3]  Weiqing Sun,et al.  Efficient spam detection across Online Social Networks , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[4]  Víctor M. Prieto,et al.  Detecting Linkedin Spammers and its Spam Nets , 2013 .

[5]  A. Ramesh,et al.  Binary Bat Approach for Effective Spam Classification in Online Social Networks , 2014 .

[6]  Tushar Gupta,et al.  Crime detection and criminal identification in India using data mining techniques , 2014, AI & SOCIETY.

[7]  Chong-kwon Kim,et al.  Follow spam detection based on cascaded social information , 2016, Inf. Sci..

[8]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[9]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[10]  Jun Ho Huh,et al.  Hybrid spam filtering for mobile communication , 2009, Comput. Secur..

[11]  Iadh Ounis,et al.  Automatically Building a Stopword List for an Information Retrieval System , 2005, J. Digit. Inf. Manag..

[12]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[13]  A. D. Swami,et al.  A Text Based Filtering System for OSN User Walls , 2014 .

[14]  José María Gómez Hidalgo,et al.  A study of the personalization of spam content using Facebook public information , 2017, Log. J. IGPL.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[17]  Jacob Soman Saini A Study of Spam Detection Algorithm on Social Media Networks , 2014 .

[18]  Wei Wang,et al.  Application of Bayesian Method to Spam SMS Filtering , 2009, 2009 International Conference on Information Engineering and Computer Science.

[19]  Haining Wang,et al.  Detecting Social Spam Campaigns on Twitter , 2012, ACNS.

[20]  Roberto Di Pietro,et al.  The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race , 2017, WWW.

[21]  Igor Santos,et al.  Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying , 2015, Log. J. IGPL.

[22]  El-Sayed M. El-Alfy,et al.  Spam filtering framework for multimodal mobile communication based on dendritic cell algorithm , 2016, Future Gener. Comput. Syst..

[23]  Muhammad Abulaish,et al.  Community-based features for identifying spammers in Online Social Networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[24]  Yi Yang,et al.  Beating the Artificial Chaos: Fighting OSN Spam Using Its Own Templates , 2016, IEEE/ACM Transactions on Networking.

[25]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[26]  Alok N. Choudhary,et al.  Towards Online Spam Filtering in Social Networks , 2012, NDSS.

[27]  Harry Wechsler,et al.  Using Social Network Analysis for Spam Detection , 2010, SBP.

[28]  M. Griffiths,et al.  The relationship between addictive use of social media and video games and symptoms of psychiatric disorders: A large-scale cross-sectional study. , 2016, Psychology of addictive behaviors : journal of the Society of Psychologists in Addictive Behaviors.

[29]  Arun Kumar Sangaiah,et al.  SMSAD: a framework for spam message and spam account detection , 2017, Multimedia Tools and Applications.

[30]  Ali M. Meligy,et al.  A Framework for Detecting Cloning Attacks in OSN Based on a Novel Social Graph Topology , 2015 .

[31]  Muhammad Abulaish,et al.  A generic statistical approach for spam detection in Online Social Networks , 2013, Comput. Commun..

[32]  Patrick P. K. Chan,et al.  Spam filtering for short messages in adversarial environment , 2015, Neurocomputing.

[33]  Juan Martínez-Romo,et al.  Detecting malicious tweets in trending topics using a statistical analysis of language , 2013, Expert Syst. Appl..

[34]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.