论文信息 - Sequential Constant Size Compressors for Reinforcement Learning

Sequential Constant Size Compressors for Reinforcement Learning

Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential Constant-Size Compressor (SCSC). The SCSC takes the form of a sequential Recurrent Auto-Associative Memory, trained through standard back-propagation. Results illustrate the feasibility of this approach -- this system learns to deal with highdimensional visual observations (up to 640 pixels) in partially observable environments where there are long time lags (up to 12 steps) between relevant sensory information and necessary action.

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Martin A. Riedmiller,et al. Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[5] Jürgen Schmidhuber,et al. Recurrent networks adjusted by adaptive critics , 1990 .

[6] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..

[7] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[8] Thomas G. Dietterich,et al. Editors. Advances in Neural Information Processing Systems , 2002 .

[9] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[10] John F. Kolen,et al. Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[11] David B. Fogel,et al. Evolving Neural Control Systems , 1995, IEEE Expert.

[12] Jürgen Schmidhuber,et al. A local learning algorithm for dynamic feedforward and recurrent networks , 1990, Forschungsberichte, TU Munich.

[13] Julian F. Miller,et al. Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[14] Joost N. Kok. Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[15] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[16] Larry D. Pyeatt,et al. A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[17] Jürgen Schmidhuber,et al. Policy Gradient Critics , 2007, ECML.

[18] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Andrew Zisserman,et al. Advances in Neural Information Processing Systems (NIPS) , 2007 .

[21] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[22] Risto Miikkulainen,et al. Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[23] Neil D. Lawrence,et al. Missing Data in Kernel PCA , 2006, ECML.

[24] Xin Yao,et al. A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[25] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[26] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[27] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.

[28] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 1996, Machine Learning.

[29] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[30] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[31] Marcus Hutter,et al. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[32] Tom Schaul,et al. High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.

[33] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.