Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
暂无分享,去创建一个
[1] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[2] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[3] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .
[4] Vivak Patel. On SGD's Failure in Practice: Characterizing and Overcoming Stalling , 2017, ArXiv.
[5] S. Dereich,et al. General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type , 2015, Numerische Mathematik.
[6] K. Ritter,et al. Minimal Errors for Strong and Weak Approximation of Stochastic Differential Equations , 2008 .
[7] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[8] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[9] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] G. Kersting. A Weak Convergence Theorem with Application to the Robbins-Monro Process , 1978 .
[11] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.
[12] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[13] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[14] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[15] F. Bach,et al. Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.
[16] H. Robbins. A Stochastic Approximation Method , 1951 .
[17] K. Chung. On a Stochastic Approximation Method , 1954 .
[18] E Weinan,et al. Dynamics of Stochastic Gradient Algorithms , 2015, ArXiv.
[19] P. Révész,et al. A limit theorem for the Robbins-Monro approximation , 1973 .
[20] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[22] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[23] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[24] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.
[25] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .
[26] Lam M. Nguyen,et al. When Does Stochastic Gradient Algorithm Work Well? , 2018, ArXiv.
[27] Philippe von Wurstemberger,et al. Strong error analysis for stochastic gradient descent optimization algorithms , 2018, 1801.09324.
[28] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[29] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[30] Marten van Dijk,et al. Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD , 2018, NeurIPS.
[31] Tzay Y. Young,et al. Error bounds for stochastic estimation of signal parameters , 1971, IEEE Trans. Inf. Theory.
[32] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .
[33] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[34] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.
[35] Huy N. Chau,et al. On fixed gain recursive estimators with discontinuity in the parameters , 2016, ESAIM: Probability and Statistics.
[36] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .