Computing heat kernel pagerank and a local clustering algorithm

Abstract Heat kernel pagerank is a variation of Personalized PageRank given in an exponential formulation. In this work, we present a sublinear time algorithm for approximating the heat kernel pagerank of a graph. The algorithm works by simulating random walks of bounded length and runs in time O ( log ( ϵ − 1 ) log n ϵ 3 log log ( ϵ − 1 ) ) , assuming performing a random walk step and sampling from a distribution with bounded support take constant time. The quantitative ranking of vertices obtained with heat kernel pagerank can be used for local clustering algorithms. We present an efficient local clustering algorithm that finds cuts by performing a sweep over a heat kernel pagerank vector, using the heat kernel pagerank approximation algorithm as a subroutine. Specifically, we show that for a subset S of Cheeger ratio ϕ , many vertices in S may serve as seeds for a heat kernel pagerank vector which will find a cut of conductance O ( ϕ ) .

[1]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Nisheeth K. Vishnoi,et al.  Matrix Inversion Is As Easy As Exponentiation , 2013, ArXiv.

[3]  Luca Trevisan,et al.  Approximating the Expansion Profile and Almost Optimal Local Graph Clustering , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[4]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[5]  William W. Cohen,et al.  A Very Fast Method for Clustering Big Text Datasets , 2010, ECAI.

[6]  Fan Chung Graham,et al.  Solving Local Linear Systems with Boundary Conditions Using Heat Kernel Pagerank , 2015, Internet Math..

[7]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[8]  Michele Benzi,et al.  Total communicability as a centrality measure , 2013, J. Complex Networks.

[9]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[10]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[11]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[12]  Yuval Peres,et al.  Finding sparse cuts locally using evolving sets , 2008, STOC '09.

[13]  Shang-Hua Teng,et al.  A Sublinear Time Algorithm for PageRank Computations , 2012, WAW.

[14]  David F. Gleich,et al.  A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks , 2013, WAW.

[15]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[16]  Noga Alon,et al.  lambda1, Isoperimetric inequalities for graphs, and superconcentrators , 1985, J. Comb. Theory, Ser. B.

[17]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[18]  Fan Chung Graham,et al.  Solving Linear Systems with Boundary Conditions Using Heat Kernel Pagerank , 2013, WAW.

[19]  Nisheeth K. Vishnoi,et al.  Approximating the exponential, the lanczos method and an Õ(m)-time spectral algorithm for balanced separator , 2011, STOC '12.

[20]  Santosh S. Vempala,et al.  On clusterings: Good, bad and spectral , 2004, JACM.

[21]  Fan Chung Graham,et al.  Computing heat kernel pagerank and a local clustering algorithm , 2014, Eur. J. Comb..

[22]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[24]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[25]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[26]  Fan Chung A Local Graph Partitioning Algorithm Using Heat Kernel Pagerank , 2009 .

[27]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[28]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[29]  Fan Chung Graham,et al.  Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm , 2007, TAMC.

[30]  Martine D. F. Schlag,et al.  Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[31]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[32]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[33]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[34]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[35]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..