A Local Algorithm for Structure-Preserving Graph Cut

Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traffic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifiable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for efficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOSPLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and efficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and efficiency of our proposed HOSPLOC algorithm.

[1]  Luca Trevisan,et al.  Approximating the Expansion Profile and Almost Optimal Local Graph Clustering , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[2]  David F. Gleich,et al.  Multilinear PageRank , 2014, SIAM J. Matrix Anal. Appl..

[3]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[4]  Hokky Situngkir,et al.  Social Balance Theory: Revisiting Heider’s Balance Theory for many agents , 2004 .

[5]  Tak Kuen Siu,et al.  Markov Chains: Models, Algorithms and Applications , 2006 .

[6]  Huan Liu,et al.  Radar: Residual Analysis for Anomaly Detection in Attributed Networks , 2017, IJCAI.

[7]  Jingrui He,et al.  Discovering rare categories from graph streams , 2016, Data Mining and Knowledge Discovery.

[8]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[10]  Fan Chung Graham,et al.  Local Partitioning for Directed Graphs Using PageRank , 2007, WAW.

[11]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[12]  Kim-Kwang Raymond Choo Money Laundering Risks of Prepaid Stored Value Cards , 2008 .

[13]  Jingrui He,et al.  Crowdsourcing via Tensor Augmentation and Completion , 2016, IJCAI.

[14]  Huan Liu,et al.  Toward Time-Evolving Feature Selection on Dynamic Networks , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[15]  Yuval Peres,et al.  Almost Optimal Local Graph Clustering Using Evolving Sets , 2016, J. ACM.

[16]  Lei Xie,et al.  FASCINATE: Fast Cross-Layer Dependency Inference on Multi-layered Networks , 2016, KDD.

[17]  Jure Leskovec,et al.  Tensor Spectral Clustering for Partitioning Higher-order Network Structures , 2015, SDM.

[18]  Jingrui He,et al.  MUVIR: Multi-View Rare Category Detection , 2015, IJCAI.

[19]  A. B. Hollingshead,et al.  Four factor index of social status , 1975 .

[20]  Yuval Peres,et al.  Finding sparse cuts locally using evolving sets , 2008, STOC '09.

[21]  A. Raftery A model for high-order Markov chains , 1985 .

[22]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[23]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[24]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[25]  S. Adke,et al.  Limit Distribution of a High Order Markov Chain , 1988 .

[26]  Jingrui He,et al.  Rare Category Detection on Time-Evolving Graphs , 2015, 2015 IEEE International Conference on Data Mining.

[27]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[28]  David F. Gleich,et al.  The Spacey Random Walk: A Stochastic Process for Higher-Order Data , 2016, SIAM Rev..

[29]  Yu Cao,et al.  Bi-Level Rare Temporal Pattern Detection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[30]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[31]  Jozef L. Teugels Markov Chains: Models, Algorithms and Applications , 2008 .

[32]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[33]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[34]  Jingrui He,et al.  MultiC2: an Optimization Framework for Learning from Task and Worker Dual Heterogeneity , 2017, SDM.

[35]  Jingrui He,et al.  On the Connectivity of Multi-layered Networks: Models, Measures and Optimal Control , 2015, 2015 IEEE International Conference on Data Mining.

[36]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[37]  Martin Rosvall,et al.  Memory in network flows and its effects on spreading dynamics and community detection , 2013, Nature Communications.

[38]  Béla Bollobás,et al.  Modern Graph Theory , 2002, Graduate Texts in Mathematics.

[39]  C. Hoofnagle Identity Theft: Making the Known Unknowns Known , 2007 .

[40]  M. Ng,et al.  On the limiting probability distribution of a transition probability tensor , 2014 .

[41]  Liyi Wen,et al.  ON THE LIMITING PROBABILITY DISTRIBUTION OF A TRANSITION PROBABILITY TENSOR , 2011 .

[42]  Jirí Síma,et al.  On the NP-Completeness of Some Graph Cluster Measures , 2005, SOFSEM.

[43]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[44]  Tao Wu,et al.  General Tensor Spectral Co-clustering for Higher-Order Data , 2016, NIPS.

[45]  Sheldon M. Ross Introduction to Probability Models. , 1995 .