Practical acceleration for computing the HITS ExpertRank vectors

A meaningful rank as well as efficient methods for computing such a rank are necessary in many areas of applications. Major methodologies for ranking often exploit principal eigenvectors. Kleinberg's HITS model is one of such methodologies. The standard approach for computing the HITS rank is the power method. Unlike the PageRank calculations where many acceleration schemes have been proposed, relatively few works on accelerating HITS rank calculation exist. This is mainly because the power method often works quite well in the HITS setting. However, there are cases where the power method is ineffective, moreover, a systematic acceleration over the power method is desirable even when the power method works well. We propose a practical acceleration scheme for HITS rank calculations based on the filtered power method by adaptive Chebyshev polynomials. For cases where the gap-ratio is below 0.85 for which the power method works well, our scheme is about twice faster than the power method. For cases where gap-ratio is unfavorable for the power method, our scheme can provide significant speedup. When the ranking problems are of very large scale, even a single matrix-vector product can be expensive, for which accelerations are highly necessary. The scheme we propose is desirable in that it provides consistent reduction in number of matrix-vector products as well as CPU time over the power method, with little memory overhead.

[1]  Gang Wu,et al.  A Power–Arnoldi algorithm for computing PageRank , 2007, Numer. Linear Algebra Appl..

[2]  Ayman Farahat,et al.  Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization , 2005, SIAM J. Sci. Comput..

[3]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  Yousef Saad,et al.  Self-consistent-field calculations using Chebyshev-filtered subspace iteration , 2006, J. Comput. Phys..

[7]  Chris H. Q. Ding,et al.  Link Analysis: Hubs and Authorities on the World Wide Web , 2004, SIAM Rev..

[8]  G. Golub,et al.  An Arnoldi-type algorithm for computing page rank , 2006 .

[9]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[10]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[11]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[12]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[13]  Claude Brezinski,et al.  Rational extrapolation for the PageRank vector , 2008, Math. Comput..

[14]  T. J. Rivlin An Introduction to the Approximation of Functions , 2003 .

[15]  Yunkai Zhou,et al.  A block Chebyshev-Davidson method with inner-outer restart for large eigenvalue problems , 2010, J. Comput. Phys..

[16]  Y. Saad,et al.  Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems , 1984 .

[17]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[18]  David F. Gleich,et al.  Fast Parallel PageRank: A Linear System Approach , 2004 .

[19]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[20]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[21]  Y. Saad,et al.  Parallel self-consistent-field calculations via Chebyshev-filtered subspace acceleration. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[23]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[24]  David F. Gleich,et al.  An Inner-Outer Iteration for Computing PageRank , 2010, SIAM J. Sci. Comput..

[25]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[26]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[27]  J. Seeley The net of reciprocal influence; a problem in treating sociometric data. , 1949 .

[28]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[29]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[30]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[31]  Danny C. Sorensen,et al.  Implicit Application of Polynomial Filters in a k-Step Arnoldi Method , 1992, SIAM J. Matrix Anal. Appl..

[32]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[33]  Yousef Saad,et al.  Evolution of magnetism in iron from the atom to the bulk. , 2006, Physical review letters.

[34]  Daniela Calvetti,et al.  Matrix methods in data mining and pattern recognition , 2009, Math. Comput..

[35]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[36]  Yousef Saad,et al.  A Chebyshev-Davidson Algorithm for Large Symmetric Eigenproblems , 2007, SIAM J. Matrix Anal. Appl..

[37]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[38]  Ren-Cang Li,et al.  Bounding the spectrum of large Hermitian matrices , 2011 .