Combining Optimization Methods Using an Adaptive Meta Optimizer
暂无分享,去创建一个
[1] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[2] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[3] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[4] Wei Zhang,et al. Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks , 2018, NeurIPS.
[5] Vimal K. Shrivastava,et al. Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification , 2019, International Journal of Remote Sensing.
[6] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[7] H. Robbins. A Stochastic Approximation Method , 1951 .
[8] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[9] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[10] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[11] H. H. Rosenbrock,et al. An Automatic Method for Finding the Greatest or Least Value of a Function , 1960, Comput. J..
[12] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[13] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[14] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[15] Mikhail Belkin,et al. Accelerating SGD with momentum for over-parameterized learning , 2018, ICLR.
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Rachid Guerraoui,et al. Asynchronous Byzantine Machine Learning ( the case of SGD ) Supplementary Material , 2022 .
[18] Zijun Zhang,et al. Improved Adam Optimizer for Deep Neural Networks , 2018, 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).
[19] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[20] Diogo Almeida,et al. Resnet in Resnet: Generalizing Residual Architectures , 2016, ArXiv.
[21] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[22] Jakub Nalepa,et al. Genetically-trained deep neural networks , 2018, GECCO.