A novel neural-network gradient optimization algorithm based on reinforcement learning
暂无分享,去创建一个
[1] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[2] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[3] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[4] Samy Bengio,et al. On the search for new learning rules for ANNs , 1995, Neural Processing Letters.
[5] Jitendra Malik,et al. Learning to Optimize Neural Nets , 2017, ArXiv.
[6] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[7] G. Evans,et al. Learning to Optimize , 2008 .
[8] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[9] Jian Li,et al. Learning Gradient Descent: Better Generalization and Longer Horizons , 2017, ICML.
[10] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[11] Magnus Thor Jonsson,et al. Evolution and design of distributed learning rules , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.
[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[13] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[14] H. Robbins. A Stochastic Approximation Method , 1951 .
[15] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[16] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[17] E. Kehoe. A layered network model of associative learning: learning to learn and configuration. , 1988, Psychological review.
[18] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[19] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[20] Richard J. Mammone,et al. Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[21] Peter R. Conwell,et al. Fixed-weight networks can learn , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[22] Daniel Jiwoong Im,et al. An empirical analysis of the optimization of deep network loss surfaces , 2016, 1612.04010.
[23] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.
[24] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.