THE ROLE OF EXPLORATION IN LEARNING CONTROL

Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be su ciently explored in order to identify a (sub-) optimal controller. For instance, a robot facing an unknown environment has to spend time moving around and acquiring knowledge. On the other hand, the environment must also be exploited during learning, i.e., experience made during learning must also be considered for action selection, if one is interested in minimizing costs of learning. For example, although a robot has to explore its environment, it should avoid collisions with obstacles once it has received some negative reward for collisions. For e cient learning, actions should thus be generated in such a way that the environment is explored and pain is avoided. This fundamental trade-o between exploration and exploitation demands e cient exploration capabilities, maximizing the e ect of learning while minimizing the costs of exploration.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[3]  John H. Holland,et al.  Genetic Algorithms and Adaptation , 1984 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[6]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  Richard E. Korf,et al.  Real-time heuristic search: new results , 1988, AAAI 1988.

[9]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[10]  R. Sutton,et al.  Connectionist Learning for Control: An Overview , 1989 .

[11]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[12]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[13]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[14]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[15]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[16]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[17]  Sebastian Thrun,et al.  Planning with an Adaptive World Model , 1990, NIPS.

[18]  Bartlett W. Mel,et al.  Murphy: A neurally-inspired connectionist approach to learning and performance in vision-based robot motion planning , 1990 .

[19]  Richard S. Sutton,et al.  Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.

[20]  Tom M. Mitchell,et al.  Becoming Increasingly Reactive , 1990, AAAI.

[21]  Yves Chauvin,et al.  Neural Networks Structured for Control Application to Aircraft Landing , 1990, NIPS.

[22]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[23]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[24]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[25]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[26]  Ming Tan,et al.  Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[27]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[28]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[29]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[30]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[31]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[32]  Sebastian Thrun,et al.  On Planning And Exploration In Non-Discrete Environments , 1991 .

[33]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[34]  A. W. Moore An Intoductory Tutorial on Kd-trees Extract from Andrew Moore's Phd Thesis: Eecient Memory-based L Earning for Robot Control , 1991 .

[35]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[36]  B. Womack,et al.  Adaptive Control Using Neural Networks , 1991, 1991 American Control Conference.

[37]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[38]  S. Thrun Eecient Exploration in Reinforcement Learning , 1992 .

[39]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[40]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[41]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[42]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .