PageRank on an evolving graph

One of the most important features of the Web graph and social networks is that they are constantly evolving. The classical computational paradigm, which assumes a fixed data set as an input to an algorithm that terminates, is inadequate for such settings. In this paper we study the problem of computing PageRank on an evolving graph. We propose an algorithm that, at any moment in the time and by crawling a small portion of the graph, provides an estimate of the PageRank that is close to the true PageRank of the graph at that moment. We will also evaluate our algorithm experimentally on real data sets and on randomly generated inputs. Under a stylized model of graph evolution, we show that our algorithm achieves a provable performance guarantee that is significantly better than the naive algorithm that crawls the nodes in a round-robin fashion.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.

[3]  Carl D. Meyer,et al.  Updating pagerank using the group inverse and stochastic complementation , 2002 .

[4]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[5]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[6]  Steve Chien,et al.  Link Evolution: Analysis and Algorithms , 2004, Internet Math..

[7]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[8]  Torsten Suel,et al.  Local methods for estimating pagerank values , 2004, CIKM '04.

[9]  Amy Nicole Langville,et al.  Updating pagerank with iterative aggregation , 2004, WWW Alt. '04.

[10]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[11]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[12]  Robert D. Kleinberg,et al.  Online decision problems with large strategy sets , 2005 .

[13]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[14]  Sandeep Pandey,et al.  User-centric Web crawling , 2005, WWW '05.

[15]  Inderjit S. Dhillon,et al.  Estimating the global pagerank of web communities , 2006, KDD '06.

[16]  Amy Nicole Langville,et al.  Updating Markov Chains with an Eye on Google's PageRank , 2005, SIAM J. Matrix Anal. Appl..

[17]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[18]  Dana Ron,et al.  Property Testing: A Learning Theory Perspective , 2007, COLT.

[19]  Anirban Dasgupta,et al.  The discoverability of the web , 2007, WWW '07.

[20]  Sandeep Pandey,et al.  Recrawl scheduling based on information longevity , 2008, WWW.

[21]  Philip S. Yu,et al.  Proximity Tracking on Time-Evolving Bipartite Graphs , 2008, SDM.

[22]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[23]  Sandeep Pandey,et al.  Crawl ordering by search impact , 2008, WSDM '08.

[24]  Dana Ron Property Testing: A Learning Theory Perspective , 2008, Found. Trends Mach. Learn..

[25]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[26]  Dana Ron,et al.  Algorithmic and Analysis Techniques in Property Testing , 2010, Found. Trends Theor. Comput. Sci..

[27]  Eli Upfal,et al.  Sort Me If You Can: How to Sort Dynamic Data , 2009, ICALP.

[28]  David Eppstein,et al.  Dynamic graph algorithms , 2010 .

[29]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[30]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[31]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[32]  Eli Upfal,et al.  Algorithms on evolving graphs , 2012, ITCS '12.