论文信息 - Reinforcement Learning Neural Turing Machines - Revised

Reinforcement Learning Neural Turing Machines - Revised

The Neural Turing Machine (NTM) is more expressive than all previously considered models because of its external memory. It can be viewed as a broader effort to use abstract external Interfaces and to learn a parametric model that interacts with them. The capabilities of a model can be extended by providing it with proper Interfaces that interact with the world. These external Interfaces include memory, a database, a search engine, or a piece of software such as a theorem verifier. Some of these Interfaces are provided by the developers of the model. However, many important existing Interfaces, such as databases and search engines, are discrete. We examine feasibility of learning models to interact with discrete Interfaces. We investigate the following discrete Interfaces: a memory Tape, an input Tape, and an output Tape. We use a Reinforcement Learning algorithm to train a neural network that interacts with such Interfaces to solve simple algorithmic tasks. Our Interfaces are expressive enough to make our model Turing complete.

Wojciech Zaremba | Ilya Sutskever | Ilya Sutskever | Wojciech Zaremba

[1] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[2] Jerome A. Feldman,et al. Connectionist Models and Their Properties , 1982, Cogn. Sci..

[3] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[4] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .

[5] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7] Jürgen Schmidhuber,et al. Optimal Ordered Problem Solver , 2002, Machine Learning.

[8] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[10] Jürgen Schmidhuber,et al. Self-Delimiting Neural Networks , 2012, ArXiv.

[11] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[13] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[14] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[15] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[16] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[17] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[18] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[19] Jason Weston,et al. Memory Networks , 2014, ICLR.

[20] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.

[21] Jason Weston,et al. Weakly Supervised Memory Networks , 2015, ArXiv.

[22] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[23] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.

[24] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..