Inferring lockstep behavior from connectivity pattern in large graphs

Given multimillion-node graphs such as “who-follows-whom”, “patent-cites-patent”, “user-likes-page” and “actor/director-makes-movie” networks, how can we find unexpected behaviors? When companies operate on the graphs with monetary incentives to sell Twitter “Followers” and Facebook page “Likes”, the graphs show strange connectivity patterns. In this paper, we study a complete graph from a large Twitter-style social network, spanning up to 3.33 billion edges. We report strange deviations from typical patterns like smooth degree distributions. We find that such deviations are often due to “lockstep behavior” that large groups of followers connect to the same groups of followees. Our first contribution is that we study strange patterns on the adjacency matrix and in the spectral subspaces with respect to several flavors of lockstep. We discover that (a) the lockstep behaviors on the graph shape dense “block” in its adjacency matrix and creates “rays” in spectral subspaces, and (b) partially overlapping of the behaviors shape “staircase” in its adjacency matrix and creates “pearls” in spectral subspaces. The second contribution is that we provide a fast algorithm, using the discovery as a guide for practitioners, to detect users who offer the lockstep behaviors in undirected/directed/bipartite graphs. We carry out extensive experiments on both synthetic and real datasets, as well as public datasets from IMDb and US Patent. The results demonstrate the scalability and effectiveness of our proposed algorithm.

[1]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[2]  Christos Faloutsos,et al.  Inferring Strange Behavior from Connectivity Pattern in Social Networks , 2014, PAKDD.

[3]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[4]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[6]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[8]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[9]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[10]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[11]  Surithong Srisa‐ard,et al.  Mining the Web: Discovering Knowledge from Hypertext Data , 2003 .

[12]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2010, PAKDD.

[13]  D. R. K. Brownrigg,et al.  The weighted median filter , 1984, CACM.

[14]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[15]  Amy J. C. Trappey,et al.  Clustering patents using non-exhaustive overlaps , 2010 .

[16]  Xiaowei Ying,et al.  On Randomness Measures for Social Networks , 2009, SDM.

[17]  Degli Studi,et al.  Periodic subgraph mining in dynamic networks , 2010 .

[18]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[19]  Christian Böhm,et al.  Summarization-based mining bipartite graphs , 2012, KDD.

[20]  Christos Faloutsos,et al.  Detecting suspicious following behavior in multimillion-node social networks , 2014, WWW.

[21]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[22]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[23]  Christos Faloutsos,et al.  A General Suspiciousness Metric for Dense Blocks in Multimodal Data , 2015, 2015 IEEE International Conference on Data Mining.

[24]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[25]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[26]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[27]  Ling Huang,et al.  Spectral Clustering with Perturbed Data , 2008, NIPS.

[28]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[29]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[30]  Manuel Barbares Periodic subgraph mining in dynamic networks , 2015 .

[31]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[32]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[33]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[34]  Zhi-Hua Zhou,et al.  Line Orthogonality in Adjacency Eigenspace with Application to Community Partition , 2011, IJCAI.

[35]  Thomas Seidl,et al.  Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors , 2013, PAKDD.

[36]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[37]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[38]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[40]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[42]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Allan R. Wilks,et al.  Fraud Detection in Telecommunications: History and Lessons Learned , 2010, Technometrics.

[45]  D. Kalman A Singularly Valuable Decomposition: The SVD of a Matrix , 1996 .