暂无分享,去创建一个
Bernard Ghanem | Motasem Alfarra | Peter Richtarik | Alyazeed Albasyoni | Slavomir Hanzely | Peter Richtárik | Bernard Ghanem | Motasem Alfarra | Alyazeed Albasyoni | Slavomír Hanzely
[1] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[2] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[3] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[4] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.
[5] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[6] V. John Mathews,et al. A stochastic gradient adaptive filter with gradient adaptive step size , 1993, IEEE Trans. Signal Process..
[7] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[8] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[9] Robert M. Gower,et al. SGD with Arbitrary Sampling: General Analysis and Improved Rates , 2019, ICML.
[10] Wonyong Sung,et al. Fixed-point optimization of deep neural networks with adaptive step size retraining , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Shiqian Ma,et al. Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.
[12] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[13] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[14] Robert M. Gower,et al. Optimal mini-batch and step sizes for SAGA , 2019, ICML.
[15] Peter Richtárik,et al. SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.
[16] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[17] F. Bach,et al. Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.
[18] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[19] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[20] J. Borwein,et al. Two-Point Step Size Gradient Methods , 1988 .
[21] David W. Jacobs,et al. Automated Inference with Adaptive Batches , 2017, AISTATS.
[22] H. Robbins. A Stochastic Approximation Method , 1951 .
[23] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[24] K. Steiglitz,et al. Adaptive step size random search , 1968 .
[25] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.
[26] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[27] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.