Learning-Rate Annealing Methods for Deep Neural Networks
暂无分享,去创建一个
Bilel Derbel | Byung-Woo Hong | Kensuke Nakamura | Kyoung-Jae Won | Byung-Woo Hong | B. Derbel | Kensuke Nakamura | Kyoung-Jae Won
[1] Gang Wang,et al. Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[3] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[4] Leon Wenliang Zhong,et al. Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.
[5] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[6] Michael I. Jordan,et al. On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.
[7] Heng Huang,et al. Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization , 2016, AAAI 2016.
[8] Nuno Lourenço,et al. Evolving Learning Rate Optimizers for Deep Neural Networks , 2021, ArXiv.
[9] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[10] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[11] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[12] Michael Unser,et al. Deep Convolutional Neural Network for Inverse Problems in Imaging , 2016, IEEE Transactions on Image Processing.
[13] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[14] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[15] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[17] H. Robbins. A Stochastic Approximation Method , 1951 .
[18] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.
[19] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[20] M. G. Madden,et al. Graph convolutional networks: analysis, improvements and results , 2019, Applied Intelligence.
[21] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[22] Dit-Yan Yeung,et al. Collaborative Deep Learning for Recommender Systems , 2014, KDD.
[23] Prateek Jain,et al. On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization , 2018, 2018 Information Theory and Applications Workshop (ITA).
[24] Quanquan Gu,et al. Stochastic Variance-Reduced Hamilton Monte Carlo Methods , 2018, ICML.
[25] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[26] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[27] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[28] David W. Jacobs,et al. Automated Inference with Adaptive Batches , 2017, AISTATS.
[29] Cho-Jui Hsieh,et al. Fast Variance Reduction Method with Stochastic Batch Size , 2018, ICML.
[30] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[31] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[32] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[33] Riccardo La Grassa,et al. Combining Optimization Methods Using an Adaptive Meta Optimizer , 2021, Algorithms.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Kfir Y. Levy,et al. Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.
[36] Shiyu Chang,et al. Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.
[37] Shu-Ching Chen,et al. T-LRA: Trend-Based Learning Rate Annealing for Deep Neural Networks , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).
[38] Javier Romero,et al. Coupling Adaptive Batch Sizes with Learning Rates , 2016, UAI.
[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[40] Lei Zhang,et al. Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images , 2018, IEEE Transactions on Image Processing.
[41] Nicola Conci,et al. How Deep Features Have Improved Event Recognition in Multimedia , 2019, ACM Trans. Multim. Comput. Commun. Appl..
[42] Fanhua Shang,et al. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates , 2018, ICML.
[43] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[44] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[45] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[46] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.
[47] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[48] Mathieu Lagrange,et al. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[49] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[50] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[51] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[52] Florian Metze,et al. A comparison of Deep Learning methods for environmental sound detection , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[54] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[55] Zebang Shen,et al. Adaptive Variance Reducing for Stochastic Gradient Descent , 2016, IJCAI.
[56] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[57] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[58] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Dacheng Tao,et al. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , 2019, NeurIPS.
[60] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[62] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.