Mining (Social) Network Graphs to Detect Random Link Attacks

Modern communication networks are vulnerable to attackers who send unsolicited messages to innocent users, wasting network resources and user time. Some examples of such attacks are spam emails, annoying tele-marketing phone calls, viral marketing in social networks, etc. Existing techniques to identify these attacks are tailored to certain specific domains (like email spam filtering), but are not applicable to a majority of other networks. We provide a generic abstraction of such attacks, called the Random Link Attack (RLA), that can be used to describe a large class of attacks in communication networks. In an RLA, the malicious user creates a set of false identities and uses them to communicate with a large, random set of innocent users. We mine the social networking graph extracted from user interactions in the communication network to find RLAs. To the best of our knowledge, this is the first attempt to conceptualize the attack definition, applicable to a variety of communication networks. In this paper, we formally define RLA and show that the problem of finding an RLA is NP-complete. We also provide two efficient heuristics to mine subgraphs satisfying the RLA property; the first (GREEDY) is based on greedy set-expansion, and the second (TRWALK) on randomized graph traversal. Our experiments with a real-life data set demonstrate the effectiveness of these algorithms.

[1]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Richard J. Lipton,et al.  Random walks, universal traversal sequences, and the complexity of maze problems , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[4]  Vijay Mahajan,et al.  New Product Diffusion Models in Marketing: A Review and Directions for Research: , 1990 .

[5]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[6]  Johan Jonasson On the Cover Time for Random Walks on Random Graphs , 1998, Comb. Probab. Comput..

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[11]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[12]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[13]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[14]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[15]  Panos M. Pardalos,et al.  A Heuristic for the Maximum Independent Set Problem Based on Optimization of a Quadratic Over a Sphere , 2002, J. Comb. Optim..

[16]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  C. Karlof,et al.  Secure routing in wireless sensor networks: attacks and countermeasures , 2003, Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003..

[18]  A. Perrig,et al.  The Sybil attack in sensor networks: analysis & defenses , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[19]  Andrew R. Curtis Small-worlds: Beyond Social Networking , 2004 .

[20]  James A. Hendler,et al.  Reputation Network Analysis for Email Filtering , 2004, CEAS.

[21]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[22]  Virgílio A. F. Almeida,et al.  Comparative Graph Theoretical Characterization of Networks of Spam , 2005, CEAS.

[23]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[24]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[25]  Blaz Zupan,et al.  Spam Filtering Using Statistical Data Compression Models , 2006, J. Mach. Learn. Res..

[26]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[27]  Irena Koprinska,et al.  Learning to classify e-mail , 2007, Inf. Sci..

[28]  Gordon V. Cormack,et al.  Spam and the ongoing battle for the inbox , 2007, CACM.

[29]  Michael Kaminsky,et al.  Toward an optimal social network defense against Sybil attacks , 2007, PODC '07.

[30]  Phillip B. Gibbons,et al.  SybilGuard: Defending Against Sybil Attacks via Social Networks , 2006, IEEE/ACM Transactions on Networking.

[31]  L. Asz Random Walks on Graphs: a Survey , 2022 .