A Survey on PageRank Computing

This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing.

[1]  M. Uschold,et al.  Methods and applications , 1953 .

[2]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[3]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[4]  Verzekeren Naar Sparen,et al.  Cambridge , 1969, Humphrey Burton: In My Own Time.

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  Mark E. Frisse,et al.  Searching for information in a hypertext medical handbook , 1987, Commun. ACM.

[7]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[8]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[9]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[10]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[11]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[12]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[13]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[14]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[15]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[16]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  A. Tomkins,et al.  Spectral filtering for resource discovery , 1998 .

[19]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[20]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[21]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[22]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[23]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[24]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[25]  David A. Cohn,et al.  Creating customized authority lists , 1999, ICML 1999.

[26]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[27]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[28]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[29]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[30]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[31]  Alberto O. Mendelzon,et al.  What is this page known for? Computing Web page reputations , 2000, Comput. Networks.

[32]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[33]  Marti A. Hearst,et al.  Link Analysis in Web Information Retrieval , 2000, IEEE Data Eng. Bull..

[34]  Andrew McCallum,et al.  Learning to Create Customized Authority Lists , 2000, ICML.

[35]  W. Stewart Numerical Methods for Computing Stationary Distributions of Finite Irreducible Markov Chains , 2000 .

[36]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[37]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[38]  Brian D. Davison Recognizing Nepotistic Links on the Web , 2000 .

[39]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[40]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[41]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[42]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[43]  Ulrik Brandes,et al.  Journal of Graph Algorithms and Applications Visual Ranking of Link Structures , 2022 .

[44]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[45]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[46]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[47]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[48]  Torsten Suel,et al.  Compressing the graph structure of the Web , 2001, Proceedings DCC 2001. Data Compression Conference.

[49]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[50]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[51]  Amos Fiat,et al.  Web search via hub synthesis , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[52]  Taher H. Haveliwala Efficient Encodings for Document Ranking Vectors (Extended Abstract) , 2003, International Conference on Internet Computing.

[53]  Jean-Loup Guillaume,et al.  Efficient and Simple Encodings for the Web Graph , 2002, WAIM.

[54]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[55]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[56]  Ziv Bar-Yossef,et al.  Template detection via data mining and its applications , 2002, WWW.

[57]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[58]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.

[59]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[60]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[61]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[62]  Franco Scarselli,et al.  PageRank: A Circuital Analysis , 2002 .

[63]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[64]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[65]  Ah Chung Tsoi,et al.  Adaptive ranking of web pages , 2003, WWW '03.

[66]  Monika Henzinger,et al.  Algorithmic Challenges in Web Search Engines , 2004, Internet Math..

[67]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[68]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[69]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[70]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[71]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[72]  Gene H. Golub,et al.  Computing PageRank using Power Extrapolation , 2003 .

[73]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[74]  John A. Tomlin,et al.  A new paradigm for ranking pages on the world wide web , 2003, WWW '03.

[75]  Taher H. Haveliwala,et al.  The Second Eigenvalue of the Google Matrix , 2003 .

[76]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[77]  Allan Borodin,et al.  Perturbation of the Hyper-Linked Environment , 2003, COCOON.

[78]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[79]  Serge Abiteboul,et al.  Adaptive on-line page importance computation , 2003, WWW '03.

[80]  David F. Gleich,et al.  Fast Parallel PageRank: A Linear System Approach , 2004 .

[81]  M. Benzi A direct projection method for Markov chains , 2004 .

[82]  Andrei Z. Broder,et al.  Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.

[83]  Heikki Mannila,et al.  Relational link-based ranking , 2004, VLDB.

[84]  Amy Nicole Langville,et al.  Updating pagerank with iterative aggregation , 2004, WWW Alt. '04.

[85]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[86]  Ilse C. F. Ipsen,et al.  Convergence analysis of an improved pagerank algorithm , 2004 .

[87]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[88]  Arnon Rungsawang,et al.  Parallel PageRank computation on a gigabit PC cluster , 2004, 18th International Conference on Advanced Information Networking and Applications, 2004. AINA 2004..

[89]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[90]  Edward A. Fox,et al.  Link fusion: a unified link analysis framework for multi-type interrelated data objects , 2004, WWW '04.

[91]  Francesco Romani,et al.  Exploiting Web matrix permutations to speedup PageRank computation , 2004 .

[92]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.

[93]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[94]  Shlomo Moran,et al.  Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs , 2005, Information Retrieval.

[95]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[96]  Steven K. Donoho,et al.  Link Analysis , 2005, Data Mining and Knowledge Discovery Handbook.

[97]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[98]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[99]  Andrei Z. Broder,et al.  Efficient PageRank approximation via graph aggregation , 2004, WWW Alt. '04.

[100]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, Internet Math..

[101]  Hector Garcia-Molina,et al.  Link spam detection based on mass estimation , 2006, VLDB.

[102]  A. Meyers Reading , 1999, Language Teaching.

[103]  R. Vandebril,et al.  Numerical Linear Algebra Internet and Large Scale Applications , 2022 .

[104]  L. Asz Random Walks on Graphs: a Survey , 2022 .