论文信息 - Reinforcement learning for the adaptive control of perception and action

Reinforcement learning for the adaptive control of perception and action

This dissertation applies reinforcement learning to the adaptive control of active sensory-motor systems. Active sensory-motor systems, in addition to providing for overt action, also support active, selective sensing of the environment. The principal advantage of this active approach to perception is that the agent's internal representation can be made highly task specific--thus, avoiding wasteful sensory processing and the representation of irrelevant information. One unavoidable consequence of active perception is that improper control can lead to internal states that confound functionally distinct states in the external world. This phenomenon, called perceptual aliasing, is shown to destabilize existing reinforcement learning algorithms with respect to optimal control. To overcome these difficulties, an approach to adaptive control, called the Consistent Representation (CR) method, is developed. This method is used to construct systems that learn not only the overt actions needed to solve a task, but also where to focus their attention in order to collect necessary sensory information. The principle of the CR-method is to separate control into two stages: an identification stage, followed by an overt stage. The identification stage generates the task-specific internal representation that is used by the overt control stage. Adaptive identification is accomplished by a technique that involves the detection and suppression of perceptually aliased internal states. Q-learning is used for adaptive overt control. The technique is then extended to include two cooperative learning mechanisms, called Learning with an External Critic (LEC) and Learning By Watching (LBW), respectively, which significantly improve learning. Cooperative mechanisms exploit the presence of helpful agents in the environment to supply auxillary sources of trial-and-error experience and to decrease the latency between the execution and evaluation of an action.

Steven Douglas Whitehead | S. Whitehead

[1] R. N. Bradt. On the Design and Comparison of Certain Dichotomous Experiments , 1954 .

[2] R. N. Bradt,et al. On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .

[3] E. Lehmann,et al. Testing Statistical Hypothesis. , 1960 .

[4] Dorian Feldman. Contributions to the "Two-Armed Bandit" Problem , 1962 .

[5] R. Bellman. Dynamic programming. , 1957, Science.

[6] A. L. I︠A︡rbus. Eye Movements and Vision , 1967 .

[7] J. V. Bradley. Distribution-Free Statistical Tests , 1968 .

[8] David Noton,et al. A Theory of Visual Pattern Perception , 1970, IEEE Trans. Syst. Sci. Cybern..

[9] P. Hayes. The frame problem and related problems in artificial intelligence , 1971 .

[10] D. Noton,et al. Eye movements and visual perception. , 1971, Scientific American.

[11] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[13] John H. Holland,et al. Genetic Algorithms and the Optimal Allocation of Trials , 1973, SIAM J. Comput..

[14] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15] T. Garvey. Perceptual strategies for purposive vision , 1975 .

[16] D Marr,et al. Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17] Saburo Tsuji,et al. Understanding a Simple Cartoon Film by a Computer Vision System , 1977, IJCAI.

[18] John McCarthy,et al. Epistemological Problems of Artificial Intelligence , 1987, IJCAI.

[19] John H. Holland,et al. Cognitive systems based on adaptive algorithms , 1977, SGAR.

[20] Earl David Sacerdoti,et al. A Structure for Plans and Behavior , 1977 .

[21] Darren Newtson,et al. The objective basis of behavior units. , 1977 .

[22] Richard A. Epstein,et al. The Theory of Gambling and Statistical Logic , 1977 .

[23] D. Marr,et al. Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[24] J. Gibson. The Ecological Approach to Visual Perception , 1979 .

[25] R. Randles,et al. Introduction to the Theory of Nonparametric Statistics , 1991 .

[26] A. Treisman,et al. A feature-integration theory of attention , 1980, Cognitive Psychology.

[27] John McCarthy,et al. SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[28] B. Julesz. Textons, the elements of texture perception, and their interactions , 1981, Nature.

[29] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] J. O'Regan,et al. Integrating visual information from successive fixations:Does trans-saccadic fusion exist? , 1983, Vision Research.

[31] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[32] S. Ullman. Visual routines , 1984, Cognition.

[33] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[34] R. Thibadeau. Artificial Perception of Actions , 1986 .

[35] Robert E. Schapire,et al. A new approach to unsupervised learning in deterministic environments , 1990 .

[36] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[37] Marc H. J. Romanycia. The Design \& Control of Visual Routines for the Computation of Simple Geometric Properties \& Relations , 1987 .

[38] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[39] Fernando J. Pineda,et al. GENERALIZATION OF BACKPROPAGATION TO RECURRENT AND HIGH-ORDER NETWORKS. , 1987 .

[40] Henry A. Kautz. A formal theory of plan recognition , 1987 .

[41] J. A. Franklin,et al. Refinement of robot motor skills through reinforcement learning , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[42] Philip E. Agre,et al. The dynamic structure of everyday life , 1988 .

[43] Dana H. Ballard,et al. Eye Fixation And Early Vision: Kinetic Depth , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[44] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[45] Dana H. Ballard,et al. Reference Frames for Animate Vision , 1989, IJCAI.

[46] D. Ballard,et al. A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[47] Leslie Pack Kaelbling,et al. A Formal Framework for Learning in Embedded Systems , 1989, ML.

[48] David Chapman,et al. Penguins Can Make Cake , 1989, AI Mag..

[49] Dana H. Ballard,et al. Reactive behavior, learning, and anticipation , 1989 .

[50] Marcel Joachim Schoppers,et al. Representation and automatic synthesis of reaction plans , 1989 .

[51] Matthew L. Ginsberg,et al. Universal Planning: An (Almost) Universally Bad Idea , 1989, AI Mag..

[52] Marcel Schoppers,et al. In Defense of Reaction Plans as Caches , 1989, AI Mag..

[53] Michael C. Mozer,et al. Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[54] Paul E. Utgoff,et al. Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.

[55] Ming Tan,et al. Two Case Studies in Cost-Sensitive Concept Acquisition , 1990, AAAI.