A Singular Perturbation Approach for Choosing the PageRank Damping Factor

We study the PageRank mass of principal components in a bow-tie web graph as a function of the damping factor c. It is known that the web graph can be divided into three principal components: SCC, IN, and OUT. The giant strongly connected component (SCC) contains a large group of pages having a hyperlink path connecting them. The pages in the IN (OUT) component have a path to (from) the SCC, but not back. Using a singular perturbation approach, we show that the PageRank share of the IN and SCC components remains high even for very large values of the damping factor, in spite of the fact that it drops to zero when c tends to one. However, a detailed study of the OUT component reveals the presence of "dead ends" (small groups of pages linking only to each other) that receive an unfairly high ranking when c is close to 1. We argue that this problem can be mitigated by choosing c as small as ½.

[1]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[2]  G. Yin,et al.  Discrete-Time Markov Chains: Two-Time-Scale Methods and Applications , 2004 .

[3]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[4]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[5]  Sergei Maslov,et al.  Finding scientific gems with Google's PageRank algorithm , 2006, J. Informetrics.

[6]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[7]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[8]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[9]  Cleve B. Moler,et al.  Numerical computing with MATLAB , 2004 .

[10]  Santo Fortunato,et al.  Random Walks on Directed Networks: the Case of PageRank , 2007, Int. J. Bifurc. Chaos.

[11]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[14]  V. S. Koroli︠u︡k,et al.  Mathematical Foundations of the State Lumping of Large Systems , 1993 .

[15]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[16]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[17]  G.G. Yin,et al.  Discrete-Time Markov Chains , 2006, IEEE Transactions on Automatic Control.

[18]  A. A. Pervozvanskiĭ,et al.  Theory of Suboptimal Decisions: Decomposition and Aggregation , 1988 .

[19]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[20]  Konstantin Avrachenkov,et al.  Inversion of Analytic Matrix Functions That are Singular at the Origin , 2000, SIAM J. Matrix Anal. Appl..

[21]  V. G. Gaitsgori,et al.  Theory of Suboptimal Decisions , 1988 .

[22]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[23]  Ricardo A. Baeza-Yates,et al.  Generalizing PageRank: damping functions for link-based ranking algorithms , 2006, SIGIR.

[24]  Konstantin Avrachenkov,et al.  The Effect of New Links on Google Pagerank , 2006 .