Patterns and anomalies in k-cores of real-world graphs with applications

How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12$$\times $$×faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17$$\times $$×faster than its competitors with comparable accuracy.

[1]  Christos Faloutsos,et al.  DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams , 2017, KDD.

[2]  Paulo Shakarian,et al.  Spreaders in the Network SIR Model: An Empirical Study , 2012, ArXiv.

[3]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[4]  R. Luce,et al.  Connectivity and generalized cliques in sociometric group structure , 1950, Psychometrika.

[5]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[6]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[7]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[8]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[9]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[10]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[11]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[12]  C. Loan The ubiquitous Kronecker product , 2000 .

[13]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Yongsub Lim,et al.  MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2015, KDD.

[15]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[16]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[17]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[18]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[19]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[20]  Jingrui He,et al.  HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection , 2017, SDM.

[21]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[22]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[24]  Willem H. Haemers,et al.  Spectra of Graphs , 2011 .

[25]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[26]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[27]  Hyun Ah Song,et al.  Matrices, Compression, Learning Curves: Formulation, and the GroupNteach Algorithms , 2016, PAKDD.

[28]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[29]  Laks V. S. Lakshmanan,et al.  Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms , 2016, SIGMOD Conference.

[30]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2010, PAKDD.

[31]  Dimitrios M. Thilikos,et al.  CoreCluster: A Degeneracy Based Graph Clustering Framework , 2014, AAAI.

[32]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[33]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[34]  R. J. Mokken,et al.  Cliques, clubs and clans , 1979 .

[35]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[36]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[37]  Alessandro Vespignani,et al.  K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases , 2005, Networks Heterog. Media.

[38]  Michalis Vazirgiannis,et al.  Spread it Good, Spread it Fast: Identification of Influential Nodes in Social Networks , 2015, WWW.

[39]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[40]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[41]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[42]  Christos Faloutsos,et al.  D-Cube: Dense-Block Detection in Terabyte-Scale Tensors , 2017, WSDM.

[43]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[44]  Christos Faloutsos,et al.  A General Suspiciousness Metric for Dense Blocks in Multimodal Data , 2015, 2015 IEEE International Conference on Data Mining.

[45]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .

[46]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[47]  Christos Faloutsos,et al.  M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees , 2016, ECML/PKDD.

[48]  Christos Faloutsos,et al.  CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[49]  Eugene C. Freuder A Sufficient Condition for Backtrack-Free Search , 1982, JACM.

[50]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[51]  P. Erdös On the structure of linear graphs , 1946 .

[52]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[53]  Stefan Wuchty,et al.  Peeling the yeast protein network , 2005, Proteomics.

[54]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[55]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[56]  Martin Farach-Colton,et al.  Computing the Degeneracy of Large Graphs , 2014, LATIN.