SGDR: Stochastic Gradient Descent with Warm Restarts
暂无分享,去创建一个
[1] C. M. Reeves,et al. Function minimization by conjugate gradients , 1964, Comput. J..
[2] M. J. D. Powell,et al. Restart procedures for the conjugate gradient method , 1977, Math. Program..
[3] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[4] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[5] Kenji Fukumizu,et al. Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.
[6] Duan Li,et al. On Restart Procedures for the Conjugate Gradient Method , 2004, Numerical Algorithms.
[7] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[8] Nikolaus Hansen,et al. Evaluating the CMA Evolution Strategy on Multimodal Test Functions , 2004, PPSN.
[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[10] Raymond Ros,et al. Benchmarking the BFGS algorithm on the BBOB-2009 function testbed , 2009, GECCO '09.
[11] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[12] Nikolaus Hansen,et al. Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed , 2009, GECCO '09.
[13] Mike Preuss,et al. Niching the CMA-ES via nearest-better clustering , 2010, GECCO '10.
[14] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[16] Michèle Sebag,et al. Alternative Restart Strategies for CMA-ES , 2012, PPSN.
[17] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[19] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[20] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[21] Leslie N. Smith,et al. No More Pesky Learning Rate Guessing Games , 2015, ArXiv.
[22] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .
[23] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.
[24] Ya Le,et al. Tiny ImageNet Visual Recognition Challenge , 2015 .
[25] Emmanuel J. Candès,et al. Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.
[26] Mike Preuss,et al. Niching Methods and Multimodal Optimization Performance , 2015 .
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[29] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[30] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[33] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[34] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[35] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[36] Wolfram Burgard,et al. Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG , 2017, ArXiv.
[37] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Kilian Q. Weinberger,et al. Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.
[40] Ke Zhang,et al. Residual Networks of Residual Networks: Multilevel Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.
[41] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .