Efficient memory-based learning for robot control

This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors|an approach which is formalized here as the SAB (State-ActionBehaviour) control cycle. A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which permits fast recall of the closest previous experience to any new situation, thus permitting very quick predictions of the e ects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. The learning can take place in high-dimensional non-linear control spaces with real-valued ranges of variables. Furthermore, the method avoids a number of shortcomings of earlier learning methods in which the controller can become trapped in inadequate performance which does not improve. Also considered is how the system is made resistant to noisy inputs and how it adapts to environmental changes. A well founded mechanism for choosing actions is introduced which solves the experiment/perform dilemma for this domain with adequate computational e ciency, and with fast convergence to the goal behaviour. The dissertation explains in detail how the SAB control cycle can be integrated into both low and high complexity tasks. The methods and algorithms are evaluated with numerous experiments using both real and simulated robot domains. The nal experiment also illustrates how a compound learning task can be structured into a hierarchy of simple learning tasks.

[1]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[2]  J. C. Burkill A First Course in Mathematical Analysis , 1962 .

[3]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[4]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  James S. Albus,et al.  Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .

[6]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[7]  M. Raibert Motor Control and Learning by the State Space Model , 1977 .

[8]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[9]  D. J. Bell,et al.  Numerical Methods for Unconstrained Optimization , 1979 .

[10]  Alexander Graham,et al.  Introduction to Control Theory, Including Optimal Control , 1980 .

[11]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[12]  R. Franke Scattered data interpolation: tests of some methods , 1982 .

[13]  Hendrik Van Brussel,et al.  A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[14]  Larry A. Rendell,et al.  A New Basis for State-Space Learning Systems and a Successful Implementation , 1983, Artif. Intell..

[15]  H. Simon,et al.  Rediscovering Chemistry with the Bacon System , 1983 .

[16]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[17]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[18]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[19]  King-Sun Fu,et al.  Learning Control Systems-Review and Outlook , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Paul E. Utgoff,et al.  Learning to control a dynamic physical system , 1987, Comput. Intell..

[21]  Bartlett W. Mel MURPHY: A Robot that Learns by Doing , 1987, NIPS.

[22]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[23]  C. S. G. Lee,et al.  Robotics: Control, Sensing, Vision, and Intelligence , 1987 .

[24]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[25]  Trevor N. Mudge,et al.  Efficient Recognition of Partially Visible Objects Using a Logarithmic Complexity Matching Technique , 1989, Int. J. Robotics Res..

[26]  Christopher G. Atkeson,et al.  Using Local Models to Control Movement , 1989, NIPS.

[27]  David W. Aha,et al.  Instance‐based prediction of real‐valued attributes , 1989, Comput. Intell..

[28]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[29]  Darrell Whitley,et al.  Applying genetic algorithms to neural network learning , 1989 .

[30]  David J. Reinkensmeyer,et al.  Using associative content-addressable memories to control robots , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[31]  Andrew W. Moore,et al.  Some experiments in adaptive state-space robotics , 1989 .

[32]  Christopher G. Atkeson,et al.  Task-level robot learning: juggling a tennis ball more accurately , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[33]  W. Thomas Miller,et al.  Real-time application of neural networks for sensor-based control of robots with vision , 1989, IEEE Trans. Syst. Man Cybern..

[34]  D. Michie Personal models of rationality , 1990 .

[35]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[36]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[37]  Richard W. Longman,et al.  Recent developments in learning control and system identification for robots and structures , 1990 .

[38]  Alan D. Christiansen,et al.  Learning reliable manipulation strategies without initial physical models , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[39]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[40]  John J. Grefenstette,et al.  Explanations of Empirically Derived Reactive Plans , 1990, ML.

[41]  Leslie Pack Kaelbling,et al.  Learning Functions in k-DNF from Reinforcement , 1990, ML.

[42]  Joachim Diederich An Explanation Component for a Connectionist Inference System , 1990, ECAI.

[43]  Andrew W. Moore,et al.  Acquisition of Dynamic Control Knowledge for a Robotic Manipulator , 1990, ML.

[44]  Peter Mowforth,et al.  Learning by an autonomous agent in the pushing domain , 1991, Robotics Auton. Syst..

[45]  Claude Sammut,et al.  Controlling a Black-Box Simulation of a Spacecraft , 1991, AI Mag..