论文信息 - Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks

Reinforcement learning agents are adaptive, reactive, and self-supervised. The aim of this dissertation is to extend the state of the art of reinforcement learning and enable its applications to complex robot-learning problems. In particular, it focuses on two issues. First, learning from sparse and delayed reinforcement signals is hard and in general a slow process. Techniques for reducing learning time must be devised. Second, most existing reinforcement learning methods assume that the world is a Markov decision process. This assumption is too strong for many robot tasks of interest. This dissertation demonstrates how we can possibly overcome the slow learning problem and tackle non-Markovian environments, making reinforcement learning more practical for realistic robot tasks: (1) Reinforcement learning can be naturally integrated with artificial neural networks to obtain high-quality generalization, resulting in a significant learning speedup. Neural networks are used in this dissertation, and they generalize effectively even in the presence of noise and a large of binary and real-valued inputs. (2) Reinforcement learning agents can save many learning trials by using an action model, which can be learned on-line. With a model, an agent can mentally experience the effects of its actions without actually executing them. Experience replay is a simple technique that implements this idea, and is shown to be effective in reducing the number of action executions required. (3) Reinforcement learning agents can take advantage of instructive training instances provided by human teachers, resulting in a significant learning speedup. Teaching can also help learning agents avoid local optima during the search for optimal control. Simulation experiments indicate that even a small amount of teaching can save agents many learning trials. (4) Reinforcement learning agents can significantly reduce learning time by hierarchical learning--they first solve elementary learning problems and then combine solutions to the elementary problems to solve a complex problem. Simulation experiments indicate that a robot with hierarchical learning can solve a complex problem, which otherwise is hardly solvable within a reasonable time. (5) Reinforcement learning agents can deal with a wide range of non-Markovian environments by having a memory of their past. Three memory architectures are discussed. They work reasonably well for a variety of simple problems. One of them is also successfully applied to a nontrivial non-Markovian robot task. The results of this dissertation rely on computer simulation, including (1) an agent operating in a dynamic and hostile environment and (2) a mobile robot operating in a noisy and non-Markovian environment. The robot simulator is physically realistic. This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning.

Long-Ji Lin | Longxin Lin

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[3] Tom M. Mitchell,et al. Generalization as Search , 2002 .

[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[5] Hans P. Moravec,et al. High resolution maps from wide angle sonar , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[6] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[7] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[8] Mozer,et al. RAMBOT (Restructuring Associative Memory Based on Training): a connectionist expert system that learns by example. Technical report, October 1985-April 1986 , 1986 .

[9] Bernardo A. Huberman,et al. AN IMPROVED THREE LAYER, BACK PROPAGATION ALGORITHM , 1987 .

[10] Ronald L. Rivest,et al. Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[11] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[12] Richard E. Korf,et al. Planning as Search: A Quantitative Approach , 1987, Artif. Intell..

[13] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[14] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[15] Bernard Widrow,et al. Adaptive switching circuits , 1988 .

[16] Scott E. Fahlman,et al. An empirical study of learning speed in back-propagation networks , 1988 .

[17] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[18] Reid Simmons,et al. Experience with a Task Control Architecture for Mobile Robots , 1989 .

[19] D. Ballard,et al. A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[20] C. Watkins. Learning from delayed rewards , 1989 .

[21] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .

[22] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.