RSN: Randomized Subspace Newton

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and randomized extension of the results of Karimireddy, Stich, and Jaggi (2019).

[1]  Tjalling J. Ypma,et al.  Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..

[2]  Alexandre d'Aspremont,et al.  Optimal Affine-Invariant Smooth Minimization Algorithms , 2018, SIAM J. Optim..

[3]  Peter Richtárik,et al.  Randomized Block Cubic Newton Method , 2018, ICML.

[4]  Peter Richtárik,et al.  SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization , 2015, ICML.

[5]  Martin J. Wainwright,et al.  Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..

[6]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[7]  J. Abatzoglou,et al.  TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015 , 2018, Scientific Data.

[8]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[9]  Peter Richtárik,et al.  Randomized Iterative Methods for Linear Systems , 2015, SIAM J. Matrix Anal. Appl..

[10]  Natalia Maltsev,et al.  WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction , 2000, Nucleic Acids Res..

[11]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[12]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[13]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[14]  Robert M. Gower,et al.  Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[15]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[16]  Peter Richtárik,et al.  Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory , 2017, SIAM J. Matrix Anal. Appl..

[17]  Robert M. Gower,et al.  A new framework for the computation of Hessians , 2012, Optim. Methods Softw..

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Jorge Nocedal,et al.  On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..

[20]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[21]  Bruce Christianson,et al.  Automatic Hessians by reverse accumulation , 1992 .

[22]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[23]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[24]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Martin Jaggi,et al.  Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients , 2018, ArXiv.

[27]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[28]  Kimon Fountoulakis,et al.  A flexible coordinate descent method , 2015, Computational Optimization and Applications.

[29]  Douglas A. Dodge,et al.  Large-scale seismic signal analysis with Hadoop , 2014, Comput. Geosci..

[30]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[31]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[32]  Martin J. Wainwright,et al.  Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[33]  C. Lee,et al.  Medical big data: promise and challenges , 2017, Kidney research and clinical practice.