CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

How can web services that depend on user generated content discern fraudulent input by spammers from legitimate input? In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created. We offer the following contributions: (1) We give a novel problem formulation, with a simple concrete definition of suspicious behavior in terms of graph structure and edge constraints. (2) We offer two algorithms to find such suspicious lockstep behavior - one provably-convergent iterative algorithm and one approximate, scalable MapReduce implementation. (3) We show that our method severely limits "greedy attacks" and analyze the bounds from the application of the Zarankiewicz problem to our setting. Finally, we demonstrate and discuss the effectiveness of CopyCatch at Facebook and on synthetic data, as well as potential extensions to anomaly detection problems in other domains. CopyCatch is actively in use at Facebook, searching for attacks on Facebook's social graph of over a billion users, many millions of Pages, and billions of Page Likes.

[1]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[2]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[3]  Peter Steenkiste,et al.  Network Anomaly Detection Using Co-clustering , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[4]  ChengYizong Mean Shift, Mode Seeking, and Clustering , 1995 .

[5]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[6]  Zoltán Füredi,et al.  An Upper Bound on Zarankiewicz' Problem , 1996, Combinatorics, Probability and Computing.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Anirban Dasgupta,et al.  Approximation algorithms for co-clustering , 2008, PODS.

[9]  Koby Crammer,et al.  A needle in a haystack: local one-class optimization , 2004, ICML.

[10]  Joydeep Ghosh,et al.  Robust one-class clustering using hybrid global and local search , 2005, ICML.

[11]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[14]  J. Bernstein AN UPPER BOUND , 2011 .

[15]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[16]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[17]  Erdong Chen,et al.  Facebook immune system , 2011, SNS '11.

[18]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Christos Faloutsos,et al.  RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  Jimeng Sun,et al.  DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[22]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[23]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.