Large-scale log-determinant computation through stochastic Chebyshev expansions

Logarithms of determinants of large positive definite matrices appear ubiquitously in machine learning applications including Gaussian graphical and Gaussian process models, partition functions of discrete graphical models, minimum-volume ellipsoids, metric learning and kernel learning. Log-determinant computation involves the Cholesky decomposition at the cost cubic in the number of variables, i.e., the matrix dimension, which makes it prohibitive for large-scale applications. We propose a linear-time randomized algorithm to approximate log-determinants for very large-scale positive definite and general non-singular matrices using a stochastic trace approximation, called the Hutchinson method, coupled with Chebyshev polynomial expansions that both rely on efficient matrix-vector multiplications. We establish rigorous additive and multiplicative approximation error bounds depending on the condition number of the input matrix. In our experiments, the proposed algorithm can provide very high accuracy solutions at orders of magnitude faster time than the Cholesky decomposition and Schur completion, and enables us to compute log-determinants of matrices involving tens of millions of variables.

[1]  T. J. Rivlin The Chebyshev polynomials , 1974 .

[2]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[3]  Ilse C. F. Ipsen Computing an Eigenvector with Inverse Iteration , 1997, SIAM Rev..

[4]  Ronald P. Barry,et al.  Monte Carlo estimates of the log determinant of large sparse matrices , 1999 .

[5]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[6]  James P. LeSage,et al.  Chebyshev approximation of log-determinants of spatial weight matrices , 2004, Comput. Stat. Data Anal..

[7]  Lloyd N. Trefethen,et al.  Barycentric Lagrange Interpolation , 2004, SIAM Rev..

[8]  Donald L. Kreher,et al.  Graphs, algorithms and optimization , 2004 .

[9]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[10]  T. Tao,et al.  Inverse Littlewood-Offord theorems and the condition number of random discrete matrices , 2005, math/0511215.

[11]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[12]  Dmitry M. Malioutov,et al.  Low-Rank Variance Estimation in Large-Scale Gmrf Models , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Martin J. Wainwright,et al.  Log-determinant relaxation for approximate inference in discrete Markov random fields , 2006, IEEE Transactions on Signal Processing.

[14]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[15]  Y. Saad,et al.  An estimator for the diagonal of a matrix , 2007 .

[16]  Y. Zhang,et al.  Approximate implementation of the logarithm of the matrix determinant in Gaussian process regression , 2007 .

[17]  Nenad Moraca,et al.  Bounds for norms of the matrix inverse and the smallest singular value , 2008 .

[18]  Nicol N. Schraudolph,et al.  Efficient Exact Inference in Planar Ising Models , 2008, NIPS.

[19]  T. Tao,et al.  Random Matrices: the Distribution of the Smallest Singular Values , 2009, 0903.0614.

[20]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[21]  P. Rousseeuw,et al.  Minimum volume ellipsoid , 2009 .

[22]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[23]  Xiaojun Chen,et al.  Error bounds for approximation in Chebyshev points , 2010, Numerische Mathematik.

[24]  Sivan Toledo,et al.  Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.

[25]  J. D. Villiers Mathematics of Approximation , 2012 .

[26]  Felix J. Herrmann,et al.  Robust inversion, dimensionality reduction, and randomized sampling , 2012, Math. Program..

[27]  Anima Anandkumar,et al.  Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[28]  Pradeep Ravikumar,et al.  BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.

[29]  M. Anitescu,et al.  STOCHASTIC APPROXIMATION OF SCORE FUNCTIONS FOR GAUSSIAN PROCESSES , 2013, 1312.2687.

[30]  Jo Eidsvik,et al.  Parameter estimation in high dimensional Gaussian distributions , 2011, Stat. Comput..

[31]  Jinwoo Shin,et al.  Large-scale Log-determinant Computation through Stochastic , 2015 .

[32]  S. Dorn,et al.  Stochastic determination of matrix determinants. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Uri M. Ascher,et al.  Improved Bounds on Sample Size for Implicit Matrix Trace Estimators , 2013, Found. Comput. Math..

[34]  Christos Boutsidis,et al.  A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix , 2015, ArXiv.

[35]  Edoardo Di Napoli,et al.  Efficient estimation of eigenvalue counts in an interval , 2013, Numer. Linear Algebra Appl..

[36]  Michael Chertkov,et al.  Learning Planar Ising Models , 2010, J. Mach. Learn. Res..