Reinforcement learning and its application to control

Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free, methods obtain the requisite training information by observing the effects of perturbing the controlled process on the IP. Despite its reputation for inefficiency, we argue that for certain types of problems the latter approach, of which reinforcement learning is an example, can yield faster, more reliable learning. Using several control problems as examples, we illustrate how the complexity of model construction can often exceed that of solving the original control problem using direct reinforcement learning methods, making indirect methods relatively inefficient. These results indicate the importance of considering direct reinforcement learning methods as tools for learning to solve control problems. We also present several techniques for augmenting the power of reinforcement learning methods. These include (1) the use of local models to guide assigning credit to the components of a reinforcement learning system, (2) implementing a procedure from experimental psychology called "shaping" to improve the efficiency of learning, thereby making more complex problems amenable to solution, and (3) implementing a multi-level learning architecture designed for exploiting task decomposability by using previously-learned behaviors as primitives for learning more complex tasks.

[1]  B. Skinner Superstition in the pigeon. , 1948, Journal of experimental psychology.

[2]  Y. T. Li,et al.  Principles of optimalizing control systems and an application to the internal combusion engine , 1951 .

[3]  J. Wolfowitz On the Stochastic Approximation Method of Robbins and Monro , 1952 .

[4]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[5]  J. Doob Stochastic processes , 1953 .

[6]  W. A. Clark,et al.  Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.

[7]  A. Dvoretzky On Stochastic Approximation , 1956 .

[8]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[9]  M. Stone,et al.  Studies in mathematical learning theory. , 1960 .

[10]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[11]  B. Widrow,et al.  Generalization and information storage in network of adaline 'neurons' , 1962 .

[12]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[13]  K. Fu,et al.  A heuristic approach to reinforcement learning control systems , 1965 .

[14]  B. Chandrasekaran,et al.  On expediency and convergence in variable structure automata , 1966 .

[15]  Harley Bornbach,et al.  An introduction to mathematical learning theory , 1967 .

[16]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[17]  A. Klopf,et al.  An Evolutionary Pattern Recognition Network , 1969 .

[18]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[19]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[20]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[21]  M. L. Tsetlin,et al.  Automaton theory and modeling of biological systems , 1973 .

[22]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[23]  E Harth,et al.  Alopex: a stochastic method for determining visual receptive fields. , 1974, Vision research.

[24]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[25]  George N. Saridis,et al.  Self-organizing control of stochastic systems , 1977 .

[26]  W. K. Honig,et al.  Handbook of Operant Behavior , 2022 .

[27]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[28]  J. Albus Mechanisms of planning and problem solving in the brain , 1979 .

[29]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[30]  Daniel E. Whitney,et al.  Quasi-Static Assembly of Compliantly Supported Rigid Parts , 1982 .

[31]  Hendrik Van Brussel,et al.  A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[32]  R. Sutton,et al.  Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[33]  Milan E. Soklic Adaptive model for decision making , 1982, Pattern Recognit..

[34]  J. Staddon Adaptive behavior and learning , 1983 .

[35]  Steven Edward Hampson,et al.  A neural model of adaptive behavior , 1983 .

[36]  D. Levine The hedonistic neuron, a theory of memory, learning, and intelligence: A. Harry Klopf Hemisphere Press, Washington, New York, and London, 1982, 140 pp., $19.95 (paperback) , 1983 .

[37]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[38]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[39]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[40]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[41]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[42]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[43]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[44]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[45]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[46]  R. E. Gustavson,et al.  A Theory for the Three-Dimensional Mating of Chamfered Cylindrical Parts , 1985 .

[47]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[48]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[49]  M. Minsky The Society of Mind , 1986 .

[50]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[51]  Michael A. Erdmann,et al.  Using Backprojections for Fine Motion Planning with Uncertainty , 1986 .

[52]  James L. McClelland,et al.  Parallel Distributed Processing: Explorations in the Microstructure of Cognition : Psychological and Biological Models , 1986 .

[53]  Bruce Randall Donald,et al.  Robot motion planning with uncertainty in the geometric models of the robot and environment: A formal framework for error detection and recovery , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[54]  King-Sun Fu,et al.  Learning Control Systems-Review and Outlook , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Paul E. Utgoff,et al.  Learning to control a dynamic physical system , 1987, Comput. Intell..

[56]  Michael Kuperstein,et al.  Adaptive visual-motor coordination in multijoint robots using parallel architecture , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[57]  Steven Jeffrey Gordon Automated assembly using feature localization , 1987 .

[58]  W. Thomas Miller,et al.  Sensor-based control of robotic manipulators using a general learning algorithm , 1987, IEEE J. Robotics Autom..

[59]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[60]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[61]  Robert B. Allen,et al.  Stochastic Learning Networks and their Electronic Implementation , 1987, NIPS.

[62]  Steven J. Nowlan,et al.  Gain Variation in Recurrent Error Propagation Networks , 1988, Complex Syst..

[63]  Yoshiro Miyata,et al.  The learning and planning of actions , 1988 .

[64]  S. Lee,et al.  Learning expert systems for robot fine motion control , 1988, Proceedings IEEE International Symposium on Intelligent Control 1988.

[65]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[66]  V. Gullapalli A Stochastic Algorithm for Learning Real-valued Functions via Reinforcement , 1988 .

[67]  Russell Leighton,et al.  Shaping schedules as a method for accelerated learning , 1988, Neural Networks.

[68]  James L. McClelland,et al.  Explorations in parallel distributed processing: a handbook of models, programs, and exercises , 1988 .

[69]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .

[70]  M. Kawato,et al.  Hierarchical neural network model for voluntary movement with application to robotics , 1988, IEEE Control Systems Magazine.

[71]  A. Meystel,et al.  Intelligent control in robotics , 1988 .

[72]  Judy A. Franklin Compliance and learning: control skills for a robot operating in an uncertain world , 1988 .

[73]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[74]  Enis Ersü,et al.  Learning Control Structures with Neuron-Like Associative Memory Systems , 1988 .

[75]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[76]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[77]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[78]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[79]  C. Watkins Learning from delayed rewards , 1989 .

[80]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[81]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[82]  P. J. Werbos,et al.  Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.

[83]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[84]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[85]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[86]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[87]  Robert B. Allen,et al.  Adaptive training for connectionist state machines , 1989, CSC '89.

[88]  David J. Reinkensmeyer,et al.  Using associative content-addressable memories to control robots , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[89]  Warren P. Seering,et al.  Assembly strategies for chamferless parts , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[90]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[91]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[92]  S. Chipman Foundations of Cognitive Science , 1990, Journal of Cognitive Neuroscience.

[93]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[94]  Derrick H. Nguyen,et al.  Truck backer-upper: an example of self-learning in neural networks , 1990, Defense, Security, and Sensing.

[95]  Stephen José Hanson,et al.  A stochastic version of the delta rule , 1990 .

[96]  Kumpati S. Narendra,et al.  Adaptive control using neural networks , 1990 .

[97]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[98]  B. Ydstie Forecasting and control using adaptive connectionist networks , 1990 .

[99]  Andrew G. Barto,et al.  Connectionist learning for control: an overview , 1990 .

[100]  H. Harry Asada,et al.  Teaching and learning of compliance using neural nets: representation and generation of nonlinear compliance , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[101]  Marwan A. Jabri,et al.  Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Computation.

[102]  Alexis P. Wieland,et al.  Evolving Controls for Unstable Systems , 1991 .

[103]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[104]  Stephen H. Lane,et al.  Goal-directed encoding of task knowledge for robotic skill acquisition , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[105]  V. Gullapalli,et al.  A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[106]  V. Gullapalli Modeling cortical area 7a using Stochastic Real-Valued (SRV) units , 1991 .

[107]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[108]  Hartmut Logemann,et al.  Multivariable feedback design : J. M. Maciejowski , 1991, Autom..

[109]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[110]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[111]  Jerry M. Mendel,et al.  Reinforcement-learning control and pattern recognition systems , 1994 .