论文信息 - GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, which is sent to the main driver. The main driver, then, averages all the ANT directions received from workers to form a {\it Globally Improved ANT} (GIANT) direction. GIANT is highly communication efficient and naturally exploits the trade-offs between local computations and global communications in that more local computations result in fewer overall rounds of communications. Theoretically, we show that GIANT enjoys an improved convergence rate as compared with first-order methods and existing distributed Newton-type methods. Further, and in sharp contrast with many existing distributed Newton-type methods, as well as popular first-order methods, a highly advantageous practical feature of GIANT is that it only involves one tuning parameter. We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.

[1] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[2] S. Sathiya Keerthi,et al. An efficient distributed learning algorithm based on effective local functional approximations , 2018, J. Mach. Learn. Res..

[3] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[4] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[5] Petros Drineas,et al. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[6] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[7] Michael W. Mahoney. Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[8] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[9] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[10] S. Muthukrishnan,et al. Faster least squares approximation , 2007, Numerische Mathematik.

[11] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12] Jacek Gondzio,et al. Exploiting separability in large-scale linear support vector machine training , 2011, Comput. Optim. Appl..

[13] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[14] Peng Xu,et al. Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..

[15] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[17] Martin J. Wainwright,et al. Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[18] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[19] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[21] Martin J. Wainwright,et al. Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..

[22] Peter Richtárik,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[23] Shusen Wang,et al. Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging , 2017, ICML.