A one-class classification approach for bot detection on Twitter

Abstract Twitter is a popular online social network with hundreds of millions of users, where n important part of the accounts in this social network are not humans. Approximately 48 million Twitter accounts are managed by automated programs called bots, which represents up to 15% of all accounts. Some bots have good purposes, such as automatically posting information about news and academic papers, and even to provide help during emergencies. Nevertheless, Twitter bots have also been used for malicious purposes, such as distributing malware or influencing the perception of the public about a topic. There are existing mechanisms that allow detecting bots on Twitter automatically; however, these mechanisms rely on examples of existing bots to discern them from legitimate accounts. As the bot landscape changes, with the bot creators using more sophisticated methods to avoid detection, new mechanisms for discerning between legitimate and bot accounts are needed. In this paper, we propose to use one-class classification to enhance Twitter bot detection, as this allows detecting novel bot accounts, and requires only from examples of legitimate accounts. Our experiment results show that our proposal can consistently detect different types of bots with a performance above 0.89 measured using AUC, without requiring previous information about them.

[1]  Pablo Suárez-Serrato,et al.  On the Influence of Social Bots in Online Protests - Preliminary Findings of a Mexican Case Study , 2016, SocInfo.

[2]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[3]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[4]  Mohamed Bekkar,et al.  Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .

[5]  Heitor S. Ramos,et al.  Detection of Bots and Cyborgs in Twitter: A Study on the Chilean Presidential Election in 2017 , 2019, HCI.

[6]  Nicholas Diakopoulos,et al.  News Bots , 2016 .

[7]  Giovanni Luca Ciampaglia,et al.  The spread of low-credibility content by social bots , 2017, Nature Communications.

[8]  Alicia Fernández,et al.  Improving Electric Fraud Detection using Class Imbalance Strategies , 2012, ICPRAM.

[9]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[10]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[11]  Vincent Larivière,et al.  Tweets as impact indicators: Examining the implications of automated “bot” accounts on Twitter , 2014, J. Assoc. Inf. Sci. Technol..

[12]  Roberto Di Pietro,et al.  DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection , 2016, IEEE Intell. Syst..

[13]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[14]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[15]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[16]  Raúl Monroy,et al.  Bagging-TPMiner: a classifier ensemble for masquerader detection based on typical objects , 2017, Soft Comput..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Filippo Menczer,et al.  The spread of fake news by social bots , 2017, ArXiv.

[19]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[20]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[21]  Huan Liu,et al.  A new approach to bot detection: Striking the balance between precision and recall , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[22]  Hossein Hamooni,et al.  Identifying Correlated Bots in Twitter , 2016, SocInfo.

[23]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[24]  Raúl Monroy,et al.  Bagging-RandomMiner: a one-class classifier for file access-based masquerade detection , 2018, Machine Vision and Applications.

[25]  Somesh Jha,et al.  Markov chains, classifiers, and intrusion detection , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[26]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[28]  Qiang Fu,et al.  Combating the evolving spammers in online social networks , 2018, Comput. Secur..

[29]  Juan Martínez-Romo,et al.  Detecting malicious tweets in trending topics using a statistical analysis of language , 2013, Expert Syst. Appl..

[30]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Neeraj Bhargava,et al.  Decision Tree Analysis on J48 Algorithm for Data Mining , 2013 .

[33]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[34]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[35]  V. S. Subrahmanian,et al.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[36]  Mohammad Iftekhar Husain,et al.  Covert Botnet Command and Control Using Twitter , 2015, ACSAC.

[37]  Jon Crowcroft,et al.  Classification of Twitter Accounts into Automated Agents and Human Users , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[38]  Fabrício Benevenuto,et al.  An empirical study of socialbot infiltration strategies in the Twitter social network , 2016, Social Network Analysis and Mining.

[39]  Luis A. Trejo,et al.  Ensemble of One-Class Classifiers for Personal Risk Detection Based on Wearable Sensor Data , 2016, Sensors.

[40]  Sebastian Stier,et al.  How to Manipulate Social Media: Analyzing Political Astroturfing Using Ground Truth Data from South Korea , 2017, ICWSM.

[41]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[42]  Angelo Spognardi,et al.  Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection , 2019, WebSci.

[43]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[44]  Muhammad Abulaish,et al.  A generic statistical approach for spam detection in Online Social Networks , 2013, Comput. Commun..

[45]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[46]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[47]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[48]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[49]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[50]  Arkaitz Zubiaga,et al.  Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter , 2015, #MSM.

[51]  Raúl Monroy,et al.  Contrast Pattern-Based Classification for Bot Detection on Twitter , 2019, IEEE Access.

[52]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[53]  Alex Hai Wang,et al.  Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[54]  José Hernández Palancar,et al.  Fingerprint Presentation Attack Detection Method Based on a Bag-of-Words Approach , 2017, CIARP.

[55]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2013, IEEE Trans. Inf. Forensics Secur..

[56]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[57]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[58]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[59]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[60]  Judea Pearl,et al.  Bayesian Networks , 1998, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[61]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[62]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[63]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[64]  Jian Cao,et al.  Combating the evasion mechanisms of social bots , 2016, Comput. Secur..

[65]  Roberto Di Pietro,et al.  The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race , 2017, WWW.

[66]  Taghi M. Khoshgoftaar,et al.  Predicting susceptibility to social bots on Twitter , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[67]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[68]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[69]  Mahardhika Pratama,et al.  Scaffolding type-2 classifier for incremental learning under concept drifts , 2016, Neurocomputing.

[70]  Filippo Menczer,et al.  Arming the public with artificial intelligence to counter social bots , 2019, Human Behavior and Emerging Technologies.

[71]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[72]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.