ASLR: An Adaptive Scheduler for Learning Rate
暂无分享,去创建一个
[1] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[2] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[4] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[5] H. Robbins. A Stochastic Approximation Method , 1951 .
[6] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[7] Mark W. Schmidt,et al. Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.
[8] Georg Martius,et al. L4: Practical loss-based stepsize adaptation for deep learning , 2018, NeurIPS.
[9] Kurt Keutzer,et al. Large batch size training of neural networks with adversarial training and second-order information , 2018, ArXiv.
[10] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[11] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[12] Philipp Hennig,et al. Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.
[13] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[14] Sungho Shin,et al. Knowledge Distillation for Optimization of Quantized Deep Neural Networks , 2019, 2020 IEEE Workshop on Signal Processing Systems (SiPS).
[15] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[16] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[17] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Lars Schmidt-Thieme,et al. Beyond Manual Tuning of Hyperparameters , 2015, KI - Künstliche Intelligenz.
[19] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[20] Hava T. Siegelmann,et al. On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.
[21] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[22] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[23] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[24] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[25] Siam Rfview,et al. CONVERGENCE CONDITIONS FOR ASCENT METHODS , 2016 .
[26] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[28] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[29] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[32] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .
[33] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[34] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[35] Hanan Samet,et al. Training Quantized Nets: A Deeper Understanding , 2017, NIPS.
[36] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[37] P. Wolfe. Convergence Conditions for Ascent Methods. II: Some Corrections , 1971 .