Reinforcement learning for the adaptive control of perception and action

This dissertation applies reinforcement learning to the adaptive control of active sensory-motor systems. Active sensory-motor systems, in addition to providing for overt action, also support active, selective sensing of the environment. The principal advantage of this active approach to perception is that the agent's internal representation can be made highly task specific--thus, avoiding wasteful sensory processing and the representation of irrelevant information. One unavoidable consequence of active perception is that improper control can lead to internal states that confound functionally distinct states in the external world. This phenomenon, called perceptual aliasing, is shown to destabilize existing reinforcement learning algorithms with respect to optimal control. To overcome these difficulties, an approach to adaptive control, called the Consistent Representation (CR) method, is developed. This method is used to construct systems that learn not only the overt actions needed to solve a task, but also where to focus their attention in order to collect necessary sensory information. The principle of the CR-method is to separate control into two stages: an identification stage, followed by an overt stage. The identification stage generates the task-specific internal representation that is used by the overt control stage. Adaptive identification is accomplished by a technique that involves the detection and suppression of perceptually aliased internal states. Q-learning is used for adaptive overt control. The technique is then extended to include two cooperative learning mechanisms, called Learning with an External Critic (LEC) and Learning By Watching (LBW), respectively, which significantly improve learning. Cooperative mechanisms exploit the presence of helpful agents in the environment to supply auxillary sources of trial-and-error experience and to decrease the latency between the execution and evaluation of an action.

[1]  R. N. Bradt On the Design and Comparison of Certain Dichotomous Experiments , 1954 .

[2]  R. N. Bradt,et al.  On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .

[3]  E. Lehmann,et al.  Testing Statistical Hypothesis. , 1960 .

[4]  Dorian Feldman Contributions to the "Two-Armed Bandit" Problem , 1962 .

[5]  R. Bellman Dynamic programming. , 1957, Science.

[6]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[7]  J. V. Bradley Distribution-Free Statistical Tests , 1968 .

[8]  David Noton,et al.  A Theory of Visual Pattern Perception , 1970, IEEE Trans. Syst. Sci. Cybern..

[9]  P. Hayes The frame problem and related problems in artificial intelligence , 1971 .

[10]  D. Noton,et al.  Eye movements and visual perception. , 1971, Scientific American.

[11]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[13]  John H. Holland,et al.  Genetic Algorithms and the Optimal Allocation of Trials , 1973, SIAM J. Comput..

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  T. Garvey Perceptual strategies for purposive vision , 1975 .

[16]  D Marr,et al.  Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  Saburo Tsuji,et al.  Understanding a Simple Cartoon Film by a Computer Vision System , 1977, IJCAI.

[18]  John McCarthy,et al.  Epistemological Problems of Artificial Intelligence , 1987, IJCAI.

[19]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[20]  Earl David Sacerdoti,et al.  A Structure for Plans and Behavior , 1977 .

[21]  Darren Newtson,et al.  The objective basis of behavior units. , 1977 .

[22]  Richard A. Epstein,et al.  The Theory of Gambling and Statistical Logic , 1977 .

[23]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[24]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[25]  R. Randles,et al.  Introduction to the Theory of Nonparametric Statistics , 1991 .

[26]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[27]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[28]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[29]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Integrating visual information from successive fixations:Does trans-saccadic fusion exist? , 1983, Vision Research.

[31]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[32]  S. Ullman Visual routines , 1984, Cognition.

[33]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[34]  R. Thibadeau Artificial Perception of Actions , 1986 .

[35]  Robert E. Schapire,et al.  A new approach to unsupervised learning in deterministic environments , 1990 .

[36]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[37]  Marc H. J. Romanycia The Design \& Control of Visual Routines for the Computation of Simple Geometric Properties \& Relations , 1987 .

[38]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  Fernando J. Pineda,et al.  GENERALIZATION OF BACKPROPAGATION TO RECURRENT AND HIGH-ORDER NETWORKS. , 1987 .

[40]  Henry A. Kautz A formal theory of plan recognition , 1987 .

[41]  J. A. Franklin,et al.  Refinement of robot motor skills through reinforcement learning , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[42]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[43]  Dana H. Ballard,et al.  Eye Fixation And Early Vision: Kinetic Depth , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[44]  Richard S. Sutton,et al.  Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[45]  Dana H. Ballard,et al.  Reference Frames for Animate Vision , 1989, IJCAI.

[46]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[47]  Leslie Pack Kaelbling A Formal Framework for Learning in Embedded Systems , 1989, ML.

[48]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[49]  Dana H. Ballard,et al.  Reactive behavior, learning, and anticipation , 1989 .

[50]  Marcel Joachim Schoppers,et al.  Representation and automatic synthesis of reaction plans , 1989 .

[51]  Matthew L. Ginsberg,et al.  Universal Planning: An (Almost) Universally Bad Idea , 1989, AI Mag..

[52]  Marcel Schoppers,et al.  In Defense of Reaction Plans as Caches , 1989, AI Mag..

[53]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[54]  Paul E. Utgoff,et al.  Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.

[55]  Ming Tan,et al.  Two Case Studies in Cost-Sensitive Concept Acquisition , 1990, AAAI.

[56]  Christopher M. Brown,et al.  Selective Attention as Sequential Behavior: Modeling Eye Movements with an Augmented Hidden Markov Model , 1990 .

[57]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[58]  Jürgen Schmidhuber,et al.  Networks adjusting networks , 1990 .

[59]  Masayuki Inaba,et al.  Design and implementation of a system that generates assembly programs from visual recognition of human action sequences , 1990, EEE International Workshop on Intelligent Robots and Systems, Towards a New Frontier of Applications.

[60]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[61]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[62]  Dana H. Ballard,et al.  Decision theory and the cost of planning , 1990 .

[63]  Reid G. Simmons,et al.  Sensible Planning: Focusing Perceptual Attention , 1991, AAAI.

[64]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[65]  Henry E. Kyburg,et al.  Evidential Probability , 1991, IJCAI.

[66]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[67]  Lambert E. Wixson,et al.  Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[68]  Patrice Yvon Simard Learning state space dynamics in recurrent networks , 1991 .

[69]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[70]  Ming Tan,et al.  Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[71]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[72]  Satinder P. Singh,et al.  Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[73]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[74]  Long Ji Lin,et al.  Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.

[75]  Ming Tan,et al.  Cost-sensitive robot learning , 1991 .

[76]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[77]  H. Robbins Some aspects of the sequential design of experiments , 1952 .