Localization in Seeded PageRank

Seeded PageRank is an important network analysis tool for identifying and studying regions nearby a given set of nodes, which are called seeds. The seeded PageRank vector is the stationary distribution of a random walk that randomly resets at the seed nodes. Intuitively, this vector is concentrated nearby the given seeds, but is mathematically non-zero for all nodes in a connected graph. We study this concentration, or localization, and show a sublinear upper bound on the number of entries required to approximate seeded PageRank on all graphs with a natural type of skewed-degree sequence---similar to those that arise in many real-world networks. Experiments with both real-world and synthetic graphs give further evidence to the idea that the degree sequence of a graph has a major influence on the localization behavior of seeded PageRank. Moreover, we establish that this localization is non-trivial by showing that complete-bipartite graphs produce seeded PageRank vectors that cannot be approximated with a sublinear number of non-zeros.

[1]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[2]  David F. Gleich,et al.  Strong Localization in Personalized PageRank Vectors , 2015, WAW.

[3]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[6]  M. Benzi,et al.  DECAY BOUNDS AND ( ) ALGORITHMS FOR APPROXIMATING FUNCTIONS OF SPARSE MATRICES , 2007 .

[7]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[8]  Kristina Lerman,et al.  The interplay between dynamics and networks: centrality, communities, and cheeger inequality , 2014, KDD.

[9]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[10]  Nicholas J. Higham,et al.  Functions of matrices - theory and computation , 2008 .

[11]  David F. Gleich,et al.  PageRank beyond the Web , 2014, SIAM Rev..

[12]  Enoch Peserico,et al.  Approximating PageRank locally with sublinear query complexity , 2014, ArXiv.

[13]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Amin Saberi,et al.  A Sequential Algorithm for Generating Random Graphs , 2007, Algorithmica.

[16]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[17]  Patrick Pantel,et al.  FactRank: Random Walks on a Web of Facts , 2010, COLING.

[18]  Valerio Freschi,et al.  Protein function prediction from interaction networks using a random walk ranking algorithm , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[19]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[20]  Ziv Bar-Yossef,et al.  Local approximation of PageRank and reverse PageRank , 2008, SIGIR '08.

[21]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[22]  Marco Bressan,et al.  Local computation of PageRank: the ranking side , 2011, CIKM '11.

[23]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[24]  William F. Moss,et al.  Decay rates for inverses of band matrices , 1984 .

[25]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[26]  G. Golub,et al.  Bounds for the Entries of Matrix Functions with Applications to Preconditioning , 1999 .

[27]  Marco Gori,et al.  ItemRank: A Random-Walk Based Scoring Algorithm for Recommender Engines , 2007, IJCAI.

[28]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[29]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[30]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[31]  Michael Schroeder,et al.  Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes , 2012, PLoS Comput. Biol..

[32]  Ashish Goel,et al.  Bidirectional PageRank Estimation: From Average-Case to Worst-Case , 2015, WAW.

[33]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[34]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.

[35]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[36]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[37]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[38]  Laks V. S. Lakshmanan,et al.  Fast Matrix Computations for Pairwise and Columnwise Commute Times and Katz Scores , 2011, Internet Math..

[39]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, Internet Math..

[40]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[41]  David F. Gleich,et al.  Sublinear Column-wise Actions of the Matrix Exponential on Social Networks , 2013, Internet Math..

[42]  S. Demko Inverses of Band Matrices and Local Convergence of Spline Projections , 1977 .

[43]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[44]  Shang-Hua Teng,et al.  Multiscale Matrix Sampling and Sublinear-Time PageRank Computation , 2012, Internet Math..

[45]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[46]  Donald F. Towsley,et al.  Quick Detection of Nodes with Large Degrees , 2014, Internet Math..

[47]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[48]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[49]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[50]  Michele Benzi,et al.  Decay Properties of Spectral Projectors with Applications to Electronic Structure , 2012, SIAM Rev..