On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond
暂无分享,去创建一个
[1] Nathan Srebro,et al. Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox , 2017, COLT.
[2] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[3] Mladen Kolar,et al. Efficient Distributed Learning with Sparsity , 2016, ICML.
[4] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[5] Shusen Wang,et al. GIANT: Globally Improved Approximate Newton Method for Distributed Optimization , 2017, NeurIPS.
[6] Steve R. Gunn,et al. Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.
[7] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.
[8] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.
[9] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.
[10] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[11] Ping Li,et al. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems , 2020, MLSys.
[12] Michael I. Jordan,et al. A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.
[13] Lin Xiao,et al. Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization , 2020, ICML.
[14] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[15] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.
[16] Peter Richt,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2016 .
[17] Bo Liu. Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning , 2017 .
[18] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[19] Jingyuan Zhang,et al. AIBox: CTR Prediction Model Training on a Single Node , 2019, CIKM.
[20] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[21] Peter Richtárik,et al. Linearly convergent stochastic heavy ball method for minimizing generalization error , 2017, ArXiv.
[22] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[23] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.
[24] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[25] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[26] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[27] Ping Li,et al. Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum , 2020, 2020 IEEE International Conference on Big Data (Big Data).
[28] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[29] Jiashi Feng,et al. Efficient Stochastic Gradient Hard Thresholding , 2018, NeurIPS.
[30] Yun Yang,et al. Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.
[31] Reynold Xin,et al. Apache Spark , 2016 .
[32] Michael I. Jordan,et al. Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.
[33] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[34] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[35] Hiroyuki Kasai. SGDLibrary: A MATLAB library for stochastic optimization algorithms , 2017, J. Mach. Learn. Res..
[36] Michael I. Jordan,et al. CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..
[37] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[38] Euhanna Ghadimi,et al. Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).
[39] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[40] Weizhu Chen,et al. DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization , 2017, J. Mach. Learn. Res..
[41] Tianbao Yang,et al. Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement , 2017 .
[42] Ping Li,et al. Toward Communication Efficient Adaptive Gradient Method , 2020, FODS.
[43] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.