Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning
暂无分享,去创建一个
[1] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[2] Jing Yang,et al. A parallel SVM training algorithm on large-scale classification problems , 2005, 2005 International Conference on Machine Learning and Cybernetics.
[3] H. Robbins. A Stochastic Approximation Method , 1951 .
[4] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[5] Wotao Yin,et al. On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..
[6] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[7] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[8] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[9] Inderjit S. Dhillon,et al. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.
[10] Nicholas I. M. Gould,et al. On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..
[11] Robert D. Nowak,et al. Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[12] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[13] Kamyar Azizzadenesheli,et al. Convergence rate of sign stochastic gradient descent for non-convex functions , 2018 .
[14] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.
[15] Heng Huang,et al. Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization , 2016, AAAI 2016.
[16] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[17] Alexander J. Smola,et al. AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization , 2015, ArXiv.
[18] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[19] Inderjit S. Dhillon,et al. Parallel matrix factorization for recommender systems , 2014, Knowl. Inf. Syst..
[20] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[21] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[22] Patrick L. Combettes,et al. Stochastic Quasi-Fejér Block-Coordinate Fixed Point Iterations with Random Sweeping , 2014 .
[23] Wotao Yin,et al. Asynchronous Coordinate Descent under More Realistic Assumptions , 2017, NIPS.
[24] Christopher De Sa,et al. Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.
[25] Serge Gratton,et al. Recursive Trust-Region Methods for Multiscale Nonlinear Optimization , 2008, SIAM J. Optim..
[26] T. Minka. Old and New Matrix Algebra Useful for Statistics , 2000 .
[27] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[28] Wotao Yin,et al. Parallel matrix factorization for low-rank tensor completion , 2013, ArXiv.
[29] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[30] Nenghai Yu,et al. Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.
[31] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[32] Heng Huang,et al. Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization , 2017, AAAI.
[33] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.