论文信息 - Faster reinforcement learning after pretraining deep networks to predict state dynamics

Faster reinforcement learning after pretraining deep networks to predict state dynamics

Deep learning algorithms have recently appeared that pretrain hidden layers of neural networks in unsupervised ways, leading to state-of-the-art performance on large classification problems. These methods can also pretrain networks used for reinforcement learning. However, this ignores the additional information that exists in a reinforcement learning paradigm via the ongoing sequence of state, action, new state tuples. This paper demonstrates that learning a predictive model of state dynamics can result in a pretrained hidden layer structure that reduces the time needed to solve reinforcement learning problems.

[1] K. N. Dollman,et al. - 1 , 1743 .

[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[4] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[5] Charles W. Anderson. Tower of Hanoi with Connectionist Networks: Learning New Features , 1989, ML.

[6] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[7] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[8] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[9] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[10] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[13] Martin A. Riedmiller,et al. Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[14] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[15] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[16] Charles W. Anderson,et al. Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[17] Minwoo Lee,et al. Convergent reinforcement learning control with neural networks and continuous action search , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[18] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.