Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks
暂无分享,去创建一个
Yanzhao Wu | Calton Pu | Wenqi Wei | Lei Yu | Arun Iyengar | Ling Liu | Qi Zhang | Juhyun Bae | Ka-Ho Chow | Ling Liu | Qi Zhang | Wenqi Wei | C. Pu | A. Iyengar | Lei Yu | Yanzhao Wu | Juhyun Bae | Ka-Ho Chow
[1] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[2] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[3] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[5] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[6] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[7] Thomas M. Breuel,et al. The Effects of Hyperparameters on SGD Training of Neural Networks , 2015, ArXiv.
[8] Nicholay Topin,et al. Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.
[9] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[10] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[11] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[12] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[13] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[14] Ling Liu,et al. A Comparative Measurement Study of Deep Learning as a Service Framework , 2018, IEEE Transactions on Services Computing.
[15] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[16] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[17] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.
[20] Padhraic Smyth,et al. Hot Swapping for Online Adaptation of Optimization Hyperparameters , 2015, ICLR.
[21] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[22] Yanzhao Wu,et al. Benchmarking Deep Learning Frameworks: Design Considerations, Metrics and Beyond , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[25] Takuya Akiba,et al. Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.
[26] Yanzhao Wu,et al. Experimental Characterizations and Analysis of Deep Learning Frameworks , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[27] Mohammed Bennamoun,et al. Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[28] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.