Neuronlike adaptive elements that can solve difficult learning control problems

It is shown how a system consisting of two neuronlike adaptive elements can solve a difficult learning control problem. The task is to balance a pole that is hinged to a movable cart by applying forces to the cart's base. It is argued that the learning problems faced by adaptive elements that are components of adaptive networks are at least as difficult as this version of the pole-balancing problem. The learning system consists of a single associative search element (ASE) and a single adaptive critic element (ACE). In the course of learning to balance the pole, the ASE constructs associations between input and output by searching under the influence of reinforcement feedback, and the ACE constructs a more informative evaluation function than reinforcement feedback alone can provide. The differences between this approach and other attempts to solve problems using neurolike elements are discussed, as is the relation of this work to classical and instrumental conditioning in animal learning studies and its possible implications for research in the neurosciences.

[1]  W. A. Clark,et al.  Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.

[2]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1959, IBM J. Res. Dev..

[3]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[4]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[5]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.

[6]  James Doran,et al.  AN APPROACH TO AUTOMATIC PROBLEM -SOLVING , 1966 .

[7]  D. Marr A theory of cerebellar cortex , 1969, The Journal of physiology.

[8]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[9]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[10]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[11]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[12]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[13]  Victor R. Lesser,et al.  Functionally Accurate, Cooperative Distributed Systems , 1988, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  B. Chandrasekaran,et al.  Natural and Social System Metaphors for Distributed Problem Solving: Introduction to the Issue , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[16]  R. Sutton,et al.  Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[17]  L. Stein,et al.  BEYOND THE REFLEX ARC: A NEURONAL MODEL OF OPERANT CONDITIONING , 1982 .

[18]  G. Edelman Group selection and phasic reentrant signaling a theory of higher brain function , 1982 .

[19]  Thomas G. Dietterich,et al.  The Role of the Critic in Learning Systems , 1984 .