论文信息 - Neural Optimizer Search with Reinforcement Learning

Neural Optimizer Search with Reinforcement Learning

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.

[1] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[2] Martin A. Riedmiller,et al. RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[3] Jurgen Schmidhuber. Steps Towards 'Self-Referential' Neural Learning: A Thought Experiment ; CU-CS-627-92 , 1992 .

[4] J. Urgen Schmidhuber,et al. Steps Towards`self-referential' Neural Learning: a Thought Experiment , 1992 .

[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6] Samy Bengio,et al. Use of genetic programming for the search of a new learning rule for neural networks , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[8] Magnus Thor Jonsson,et al. Evolution and design of distributed learning rules , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[9] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[10] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[14] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[15] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[16] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[17] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[18] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[19] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[20] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.