SpamSpotter: An efficient spammer detection framework based on intelligent decision support system on Facebook

Abstract Facebook is one of the most popular and leading social network services online. With the increasing amount of users on Facebook, the probability of broadcasting spam content on it is also escalating day by day. There are a few existing techniques to combat spam on Facebook. However, due to the public unavailability of critical pieces of Facebook information, like profiles, network information, an unlimited number of posts and more, the existing techniques do not work efficiently for detecting many spammers. In this paper, we propose an efficient spammer detection framework (we called as SpamSpotter) that distinguishes spammers from legitimate users on Facebook. Based on Facebook's recent characteristics, the framework introduces a novel feature set to facilitate spammer detection. We use a baseline dataset from Facebook that included 300 spammers and 700 legitimate user profiles. The baseline dataset contains a set of features for each profile, which are extracted using a novel dataset construction mechanism. In addition, an intelligent decision support system that uses eight different machine learning classifiers on the baseline dataset is designed to distinguish spammers from legitimate users. To evaluate the efficiency and accuracy of our proposed framework, we implemented and compared it with existing frameworks. The evaluation results demonstrate that our proposed framework is accurate and efficient to deliver first-rate performance. It attains a higher accuracy of 0.984 and Mathew correlation coefficient of 0.977.

[1]  Weiqing Sun,et al.  Efficient spam detection across Online Social Networks , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[2]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[3]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[4]  Muhammad Abulaish,et al.  A generic statistical approach for spam detection in Online Social Networks , 2013, Comput. Commun..

[5]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[6]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[7]  Chantana Phongpensri,et al.  Two Machine Learning Models for Mobile Phone Battery Discharge Rate Prediction Based on Usage Patterns , 2016, J. Inf. Process. Syst..

[8]  Alok N. Choudhary,et al.  Towards Online Spam Filtering in Social Networks , 2012, NDSS.

[9]  Danielle H. Lee Personalizing Information Using Users' Online Social Networks: A Case Study of CiteULike , 2015, J. Inf. Process. Syst..

[10]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[11]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  RAJENDRA KUMAR ROUL,et al.  Detecting spam web pages using content and link-based techniques , 2016 .

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[16]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[17]  Pallapa Venkataram,et al.  A method of designing a generic actor model for a professional social network , 2015, Human-centric Computing and Information Sciences.

[18]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[19]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[20]  Calton Pu,et al.  SPADE: a social-spam analytics and detection framework , 2014, Social Network Analysis and Mining.

[21]  Enrique Herrera-Viedma,et al.  Trust based consensus model for social network in an incomplete linguistic information context , 2015, Appl. Soft Comput..

[22]  Mohit Agrawal,et al.  Unsupervised Spam Detection in Hyves Using SALSA , 2015, FICTA.

[23]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[24]  Yao Lu,et al.  Detecting “Smart” Spammers on Social Network: A Topic Model Approach , 2016, NAACL.