Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory

We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient conditions for the reformulation to be exact. Further, we propose and analyze three stochastic algorithms for solving the reformulated problem---basic, parallel and accelerated methods---with global linear convergence rates. The rates can be interpreted as condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon which we call stochastic preconditioning, and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method, with fixed stepsize (relaxation parameter), applied to the reformulations.

[1]  Jay P. Fillmore,et al.  Linear Recursive Sequences , 1968 .

[2]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[3]  R. Dykstra An Algorithm for Restricted Least Squares Regression , 1983 .

[4]  R. Dykstra,et al.  A Method for Finding Projections onto the Intersection of Convex Sets in Hilbert Spaces , 1986 .

[5]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[6]  S. Elaydi An introduction to difference equations , 1995 .

[7]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[8]  Giuseppe Carlo Calafiore,et al.  Randomized algorithms for probabilistic robustness with real and complex structured uncertainty , 2000, IEEE Trans. Autom. Control..

[9]  Giuseppe Carlo Calafiore,et al.  Stochastic algorithms for exact and approximate feasibility of robust LMIs , 2001, IEEE Trans. Autom. Control..

[10]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[11]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[12]  R. Tempo,et al.  Randomized Algorithms for Analysis and Control of Uncertain Systems , 2004 .

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[15]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[16]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[17]  H. Robbins A Stochastic Approximation Method , 1951 .

[18]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[19]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[20]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[21]  D. Needell Randomized Kaczmarz solver for noisy linear systems , 2009, 0902.0958.

[22]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[23]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[24]  Nathan Srebro,et al.  Optimistic Rates for Learning with a Smooth Loss , 2010, 1009.3896.

[25]  Sivan Toledo,et al.  Blendenpik: Supercharging LAPACK's Least-Squares Solver , 2010, SIAM J. Sci. Comput..

[26]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[27]  Angelia Nedic,et al.  Random algorithms for convex minimization problems , 2011, Math. Program..

[28]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[29]  Deanna Needell,et al.  Paved with Good Intentions: Analysis of a Randomized Block Kaczmarz Method , 2012, ArXiv.

[30]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[31]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[32]  Nikolaos M. Freris,et al.  Randomized Extended Kaczmarz for Solving Least Squares , 2012, SIAM J. Matrix Anal. Appl..

[33]  Christian L. Müller,et al.  Optimization of Convex Functions with Random Pursuit , 2011, SIAM J. Optim..

[34]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[35]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling , 2014, ArXiv.

[36]  Haim Avron,et al.  Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2013, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[37]  Michael A. Saunders,et al.  LSRN: A Parallel Iterative Solver for Strongly Over- or Underdetermined Systems , 2011, SIAM J. Sci. Comput..

[38]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[39]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[40]  Aaditya Ramdas Rows vs Columns for Linear Systems of Equations - Randomized Kaczmarz or Coordinate Descent? , 2014, ArXiv.

[41]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[42]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[43]  P. Oswald,et al.  Convergence analysis for Kaczmarz-type methods in a Hilbert space framework , 2015 .

[44]  Patrick L. Combettes,et al.  Stochastic Quasi-Fejér Block-Coordinate Fixed Point Iterations with Random Sweeping , 2014 .

[45]  Peter Richtárik,et al.  Randomized Iterative Methods for Linear Systems , 2015, SIAM J. Matrix Anal. Appl..

[46]  Peter Richtárik,et al.  Stochastic Dual Ascent for Solving Linear Systems , 2015, ArXiv.

[47]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[48]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[49]  Peter Richtárik,et al.  Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2015, NIPS.

[50]  Stephen J. Wright,et al.  An accelerated randomized Kaczmarz algorithm , 2013, Math. Comput..

[51]  Peter Richtárik,et al.  Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..

[52]  Jie Liu,et al.  Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.

[53]  Peter Richtárik,et al.  A new perspective on randomized gossip algorithms , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[54]  Robert M. Gower,et al.  Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[55]  Peter Richtárik,et al.  SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization , 2015, ICML.

[56]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[57]  Martin J. Wainwright,et al.  Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[58]  Peter Richtárik,et al.  Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse , 2016, 1612.06255.

[59]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[60]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[61]  Robert M. Gower,et al.  Randomized Quasi-Newton Updates Are Linearly Convergent Matrix Inversion Algorithms , 2016, SIAM J. Matrix Anal. Appl..

[62]  Peter Richtárik,et al.  Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[63]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[64]  Martin J. Wainwright,et al.  Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..

[65]  Peter Richtárik,et al.  SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.

[66]  Peter Richtárik,et al.  Accelerated Gossip via Stochastic Heavy Ball Method , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[67]  Robert M. Gower,et al.  Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization , 2018, NeurIPS.

[68]  Peter Richtárik,et al.  Parallel Stochastic Newton Method , 2017, Journal of Computational Mathematics.

[69]  Peter Richtárik,et al.  Stochastic Spectral and Conjugate Descent Methods , 2018, NeurIPS.

[70]  Peter Richtárik,et al.  Randomized Projection Methods for Convex Feasibility: Conditioning and Convergence Rates , 2019, SIAM J. Optim..

[71]  Peter Richtárik,et al.  Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.

[72]  F. Bach,et al.  Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.