Probabilistic Interpretation of Linear Solvers

This manuscript proposes a probabilistic framework for algorithms that iteratively solve unconstrained linear problems $Bx = b$ with positive definite $B$ for $x$. The goal is to replace the point estimates returned by existing methods with a Gaussian posterior belief over the elements of the inverse of $B$, which can be used to estimate errors. Recent probabilistic interpretations of the secant family of quasi-Newton optimization algorithms are extended. Combined with properties of the conjugate gradient algorithm, this leads to uncertainty-calibrated methods with very limited cost overhead over conjugate gradients, a self-contained novel interpretation of the quasi-Newton and conjugate gradient algorithms, and a foundation for new nonlinear optimization methods.

[1]  Michael L. Overton,et al.  Primal-Dual Interior-Point Methods for Semidefinite Programming: Convergence Rates, Stability and Numerical Results , 1998, SIAM J. Optim..

[2]  Samuel D. Conte,et al.  Elementary Numerical Analysis: An Algorithmic Approach , 1975 .

[3]  Martin Kiefel,et al.  Quasi-Newton Methods: A New Direction , 2012, ICML.

[4]  H. Walker Quasi-Newton Methods , 1978 .

[5]  Iain Murray Introduction To Gaussian Processes , 2008 .

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[8]  Larry Nazareth,et al.  A family of variable metric updates , 1977, Math. Program..

[9]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[10]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[11]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[12]  S. Vajda,et al.  Numerical Methods for Non-Linear Optimization , 1973 .

[13]  J. J. Moré,et al.  Quasi-Newton Methods, Motivation and Theory , 1974 .

[14]  M. Powell A New Algorithm for Unconstrained Optimization , 1970 .

[15]  D. Gay,et al.  Some Convergence Properties of Broyden&Apos;S Method , 1977 .

[16]  William C. Davidon,et al.  Optimally conditioned optimization algorithms without line searches , 1975, Math. Program..

[17]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[18]  Hector J. Martinez Local and Superlinear Convergence of Structural Secant Methods from the Convex Class , 1988 .

[19]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[20]  P. Diaconis,et al.  The Subgroup Algorithm for Generating Uniform Random Variables , 1987, Probability in the Engineering and Informational Sciences.

[21]  L. C W. Dixon,et al.  Quasi-newton algorithms generate identical points , 1972, Math. Program..

[22]  H. Walker,et al.  Convergence Theorems for Least-Change Secant Update Methods, , 1981 .

[23]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[24]  R. Schnabel,et al.  Least Change Secant Updates for Quasi-Newton Methods , 1978 .

[25]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[26]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[27]  D. J. Bell,et al.  Numerical Methods for Unconstrained Optimization , 1979 .

[28]  L. Nazareth A Relationship between the BFGS and Conjugate Gradient Algorithms and Its Implications for New Algorithms , 1979 .

[29]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[30]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[31]  L. C. W. Dixon,et al.  Quasi Newton techniques generate identical points II: The proofs of four new theorems , 1972, Math. Program..

[32]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[33]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[34]  William C. Davidon,et al.  Variable Metric Method for Minimization , 1959, SIAM J. Optim..

[35]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[36]  Philipp Hennig,et al.  Fast Probabilistic Optimization from Noisy Gradients , 2013, ICML.

[37]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[38]  C. G. Broyden Quasi-Newton methods and their application to function minimisation , 1967 .

[39]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[40]  C. Loan The ubiquitous Kronecker product , 2000 .

[41]  J. Greenstadt Variations on Variable-Metric Methods , 1970 .