暂无分享,去创建一个
[1] K. Chung. On a Stochastic Approximation Method , 1954 .
[2] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .
[3] E. Parzen. Annals of Mathematical Statistics , 1962 .
[4] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[5] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[6] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[7] H. Robbins. A Stochastic Approximation Method , 1951 .
[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[9] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[10] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[11] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[12] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[13] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[14] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[15] Nicholas I. M. Gould,et al. SIAM Journal on Optimization , 2012 .
[16] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[17] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[18] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[19] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[20] Qiang Chen,et al. Network In Network , 2013, ICLR.
[21] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[22] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[24] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[25] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[26] Heike Freud,et al. On Line Learning In Neural Networks , 2016 .
[27] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.
[28] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[29] Nathan Srebro,et al. Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox , 2017, COLT.
[30] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..