Detecting Spam Accounts on Twitter

Social networks have become a popular way for internet surfers to interact with friends and family members, reading news, and also discuss events. Users spend more time on well-known social platforms (e.g., Facebook, Twitter, etc.) storing and sharing their personal information. This information together with the opportunity of contacting thousands of users attract the interest of malicious users. They exploit the implicit trust relationships between users in order to achieve their malicious aims, for example, create malicious links within the posts/tweets, spread fake news, send out unsolicited messages to legitimate users, etc. In this paper, we investigate the nature of spam users on Twitter with the goal to improve existing spam detection mechanisms. For detecting Twitter spammers, we make use of several new features, which are more effective and robust than existing used features (e.g., number of followings/followers, etc.). We evaluated the proposed set of features by exploiting very popular machine learning classification algorithms, namely k-Nearest Neighbor (k-NN), Decision Tree (DT), Naive Bayesian (NB), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XG-Boost). The performance of these classifiers are evaluated and compared based on different evaluation metrics. We compared the performance of our proposed approach with four latest state of art approaches. The experimental results show that the proposed set of features gives better performance than existing state of art approaches.

[1]  Hossam Faris,et al.  Feature engineering for detecting spammers on Twitter: Modelling and analysis , 2018, J. Inf. Sci..

[2]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[3]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[4]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[5]  Abdullah Talha Kabakus,et al.  A Survey of Spam Detection Methods on Twitter , 2017 .

[6]  Monika Singh,et al.  Who is Who on Twitter–Spammer, Fake or Compromised Account? A Tool to Reveal True Identity in Real-Time , 2018, Cybern. Syst..

[7]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[8]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9]  Buket Kaya,et al.  Detecting Spammers in Twitter Network , 2017 .

[10]  Hemlata Channe,et al.  Comparative Study of K-NN , Naive Bayes and Decision Tree Classification Techniques , 2016 .

[11]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[12]  Arkaitz Zubiaga,et al.  Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter , 2015, #MSM.

[13]  Ala' M. Al-Zoubi,et al.  Spam profile detection in social networks based on public features , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.