Forecasting Suspicious Account Activity at Large-Scale Online Service Providers

In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of new attacks and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop an early warning system that harnesses account activity traces to predict which accounts are likely to be compromised in the future and generate suspicious activity. We hypothesize that this early warning is key for a more timely detection of compromised accounts and consequently faster remediation. We demonstrate the feasibility and applicability of the system through an experiment at a large-scale online service provider using four months of real-world production data encompassing hundreds of millions of users. We show that - even using only login data to derive features with low computational cost, and a basic model selection approach - our classifier can be tuned to achieve good classification precision when used for forecasting. Our system correctly identifies up to one month in advance the accounts later flagged as suspicious with precision, recall, and false positive rates that indicate the mechanism is likely to prove valuable in operational settings to support additional layers of defense.

[1]  Nicolas Christin,et al.  Automatically Detecting Vulnerable Websites Before They Turn Malicious , 2014, USENIX Security Symposium.

[2]  Sakshi Jain,et al.  Who Are You? A Statistical Approach to Measuring User Authenticity , 2016, NDSS.

[3]  BlanzieriEnrico,et al.  A survey of learning-based techniques of email spam filtering , 2008 .

[4]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[5]  Konstantin Beznosov,et al.  Íntegro: Leveraging victim prediction for robust fake account detection in large scale OSNs , 2016, Comput. Secur..

[6]  Konstantin Beznosov,et al.  Harvesting the low-hanging fruits: defending against automated large-scale cyber-intrusions by focusing on the vulnerable population , 2016, NSPW.

[7]  Ben Y. Zhao,et al.  Uncovering social network sybils in the wild , 2011, IMC '11.

[8]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[9]  Erdong Chen,et al.  Facebook immune system , 2011, SNS '11.

[10]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[11]  Yada Zhu,et al.  Social Phishing , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[12]  Ross J. Anderson,et al.  The Economics of Online Crime , 2009 .

[13]  Gang Wang,et al.  Northeastern University , 2021, IEEE Pulse.

[14]  David A. Wagner,et al.  Detecting Credential Spearphishing in Enterprise Settings , 2017, USENIX Security Symposium.

[15]  Guanhua Yan,et al.  Malware propagation in online social networks: nature, dynamics, and defense implications , 2011, ASIACCS '11.

[16]  Taeshik Shon,et al.  A hybrid machine learning approach to network anomaly detection , 2007, Inf. Sci..

[17]  Vern Paxson,et al.  Consequences of Connectivity: Characterizing Account Hijacking on Twitter , 2014, CCS.

[18]  Markus Jakobsson,et al.  Social phishing , 2007, CACM.

[19]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[20]  Leyla Bilge,et al.  RiskTeller: Predicting the Risk of Cyber Incidents , 2017, CCS.

[21]  Konstantin Beznosov,et al.  Integro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs , 2015, NDSS.

[22]  William H. Sanders,et al.  Safeguarding academic accounts and resources with the University Credential Abuse Auditing System , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[23]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[24]  Leyla Bilge,et al.  On the effectiveness of risk prediction based on users browsing behavior , 2014, AsiaCCS.

[25]  Gang Liu,et al.  Smartening the crowds: computational techniques for improving human verification to fight phishing scams , 2011, SOUPS.

[26]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[27]  Christopher Krügel,et al.  On the Effectiveness of Techniques to Detect Phishing Sites , 2007, DIMVA.

[28]  Gianluca Stringhini,et al.  COMPA: Detecting Compromised Accounts on Social Networks , 2013, NDSS.

[29]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[30]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[31]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[32]  Parinaz Naghizadeh Ardabili,et al.  Cloudy with a Chance of Breach: Forecasting Cyber Security Incidents , 2015, USENIX Security Symposium.

[33]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[34]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.