Íntegro: Leveraging victim prediction for robust fake account detection in large scale OSNs

Abstract Detecting fake accounts in online social networks (OSNs) protects both OSN operators and their users from various malicious activities. Most detection mechanisms attempt to classify user accounts as real (i.e., benign, honest) or fake (i.e., malicious, Sybil) by analyzing either user-level activities or graph-level structures. These mechanisms, however, are not robust against adversarial attacks in which fake accounts cloak their operation with patterns resembling real user behavior. In this article, we show that victims – real accounts whose users have accepted friend requests sent by fakes – form a distinct classification category that is useful for designing robust detection mechanisms. In particular, we present Integro – a robust and scalable defense system that leverages victim classification to rank most real accounts higher than fakes, so that OSN operators can take actions against low-ranking fake accounts. Integro starts by identifying potential victims from user-level activities using supervised machine learning. After that, it annotates the graph by assigning lower weights to edges incident to potential victims. Finally, Integro ranks user accounts based on the landing probability of a short random walk that starts from a known real account. As this walk is unlikely to traverse low-weight edges in a few steps and land on fakes, Integro achieves the desired ranking. We implemented Integro using widely-used, open-source distributed computing platforms, where it scaled nearly linearly. We evaluated Integro against SybilRank, which is the state-of-the-art in fake account detection, using real-world datasets and a large-scale deployment at Tuenti – the largest OSN in Spain with more than 15 million active users. We show that Integro significantly outperforms SybilRank in user ranking quality, with the only requirement that the employed victim classifier is better than random. Moreover, the deployment of Integro at Tuenti resulted in up to an order of magnitude higher precision in fake account detection, as compared to SybilRank.

[1]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[2]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[3]  Michael Sirivianos,et al.  Combating Friend Spam Using Social Rejections , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[4]  Angelika Foerster Introduction To Markov Chains With Special Emphasis On Rapid Mixing , 2016 .

[5]  Michael Kaminsky,et al.  SybilGuard: defending against sybil attacks via social networks , 2006, SIGCOMM.

[6]  Martín Abadi,et al.  Innocent by association: early recognition of legitimate users , 2012, CCS '12.

[7]  Gang Wang,et al.  Northeastern University , 2021, IEEE Pulse.

[8]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[9]  Erdong Chen,et al.  Facebook immune system , 2011, SNS '11.

[10]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[12]  Yuval Elovici,et al.  Homing Socialbots: Intrusion on a specific organization's employee using Socialbots , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[13]  Markus Strohmaier,et al.  When Social Bots Attack: Modeling Susceptibility of Users in Online Social Networks , 2012, #MSM.

[14]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[15]  Konstantin Beznosov,et al.  Graph-based Sybil Detection in social and information systems , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[16]  Silvio Lattanzi,et al.  SoK: The Evolution of Sybil Defense via Social Networks , 2013, 2013 IEEE Symposium on Security and Privacy.

[17]  Matteo Dell Amico A Measurement of Mixing Time in Social Networks , 2009 .

[18]  Yuval Elovici,et al.  Friend or foe? Fake profile identification in online social networks , 2013, Social Network Analysis and Mining.

[19]  Guanhua Yan,et al.  Malware propagation in online social networks: nature, dynamics, and defense implications , 2011, ASIACCS '11.

[20]  Alistair Sinclair,et al.  Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[21]  Konstantin Beznosov,et al.  Key Challenges in Defending Against Malicious Socialbots , 2012, LEET.

[22]  A. Sinclair Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[23]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[24]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[25]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[26]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[27]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[28]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[29]  Konstantin Beznosov,et al.  Design and analysis of a social botnet , 2013, Comput. Networks.

[30]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[31]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[32]  Y. Elovici,et al.  Strangers Intrusion Detection - Detecting Spammers and Fake Proles in Social Networks Based on Topology Anomalies , 2012 .

[33]  Gang Wang,et al.  Social Turing Tests: Crowdsourcing Sybil Detection , 2012, NDSS.

[34]  Feng Xiao,et al.  SybilLimit: A Near-Optimal Social Network Defense Against Sybil Attacks , 2010, IEEE/ACM Trans. Netw..

[35]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[36]  Gianluca Stringhini,et al.  Towards Detecting Compromised Accounts on Social Networks , 2015, IEEE Transactions on Dependable and Secure Computing.

[37]  Aziz Mohaisen,et al.  On the mixing time of directed social graphs and security implications , 2012, ASIACCS '12.

[38]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Nick Feamster,et al.  Photo-based authentication using social networks , 2008, WOSN '08.

[41]  George Danezis,et al.  SybilInfer: Detecting Sybil Nodes using Social Networks , 2009, NDSS.

[42]  Lakshminarayanan Subramanian,et al.  Optimal Sybil-resilient node admission control , 2011, 2011 Proceedings IEEE INFOCOM.

[43]  Stefan Savage,et al.  Dirty Jobs: The Role of Freelance Labor in Web Service Abuse , 2011, USENIX Security Symposium.

[44]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[45]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[46]  Ben Y. Zhao,et al.  Uncovering social network sybils in the wild , 2011, IMC '11.

[47]  Ravi S. Sandhu,et al.  Social-Networks Connect Services , 2010, Computer.

[48]  Krishna P. Gummadi,et al.  Exploring the design space of social network-based Sybil defenses , 2012, 2012 Fourth International Conference on Communication Systems and Networks (COMSNETS 2012).

[49]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[50]  Xiaowei Yang,et al.  SybilFence: Improving Social-Graph-Based Sybil Defenses with User Negative Feedback , 2013, ArXiv.

[51]  Gianluca Stringhini,et al.  COMPA: Detecting Compromised Accounts on Social Networks , 2013, NDSS.

[52]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[53]  Krishna P. Gummadi,et al.  An analysis of social network-based Sybil defenses , 2010, SIGCOMM 2010.

[54]  Qiang Cao,et al.  Uncovering Large Groups of Active Malicious Accounts in Online Social Networks , 2014, CCS.

[55]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[56]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[57]  Haifeng Yu,et al.  Sybil defenses via social networks: a tutorial and survey , 2011, SIGA.

[58]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[59]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.

[60]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[61]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[62]  Max Nanis,et al.  Socialbots: voices from the fronts , 2012, INTR.

[63]  Lakshminarayanan Subramanian,et al.  Sybil-Resilient Online Content Voting , 2009, NSDI.

[64]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[65]  Gianluca Stringhini,et al.  EVILCOHORT: Detecting Communities of Malicious Accounts on Online Services , 2015, USENIX Security Symposium.

[66]  Gang Wang,et al.  Follow the green: growth and dynamics in twitter follower markets , 2013, Internet Measurement Conference.

[67]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[68]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[69]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[70]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.