Robot juggling: implementation of memory-based learning

Issues involved in implementing robot learning for a challenging dynamic task are explored in this article, using a case study from robot juggling. We use a memory-based local modeling approach (locally weighted regression) to represent a learned model of the task to be performed. Statistical tests are given to examine the uncertainty of a model, to optimize its prediction quality, and to deal with noisy and corrupted data. We develop an exploration algorithm that explicitly deals with prediction accuracy requirements during exploration. Using all these ingredients in combination with methods from optimal control, our robot achieves fast real-time learning of the task within 40 to 100 trials.<<ETX>>

[1]  Frederick Robertson Macaulay,et al.  The Smoothing of Time Series , 1931 .

[2]  Ludwig Braun,et al.  Adaptive control systems , 1959 .

[3]  Karl Steinbuch,et al.  Learning Matrices and Their Applications , 1963, IEEE Trans. Electron. Comput..

[4]  Karl Steinbuch,et al.  Adaptive Systems in Pattern Recognition , 1963, IEEE Trans. Electron. Comput..

[5]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[6]  W. K. Taylor Cortico-thalamic organization and memory , 1964, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  A. G. Butkovskiy,et al.  Optimal control of systems , 1966 .

[8]  R. Bellman Dynamic programming. , 1957, Science.

[9]  I. K Crain,et al.  Treatment of non-equispaced two-dimensional data with a digital computer , 1967 .

[10]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[11]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[14]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[15]  James S. Albus,et al.  Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .

[16]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[17]  W. J. Gordon,et al.  Shepard’s method of “metric interpolation” to bivariate and multivariate interpolation , 1978 .

[18]  Luc Devroye,et al.  The uniform convergence of nearest neighbor regression function estimators and their application in optimization , 1978, IEEE Trans. Inf. Theory.

[19]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[20]  G. Siouris,et al.  Optimum systems control , 1979, Proceedings of the IEEE.

[21]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[22]  Y. Bar-Shalom Stochastic dynamic programming: Caution and probing , 1981 .

[23]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[24]  R. Franke Scattered data interpolation: tests of some methods , 1982 .

[25]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[26]  I. Schagen Sequential Exploration of Unknown Multi-dimensional Functions as an Aid to Optimization , 1984 .

[27]  P. Cheng Strong consistency of nearest neighbor regression function estimators , 1984 .

[28]  Rodney A. Brooks,et al.  A subdivision algorithm in configuration space for findpath with rotation , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[30]  William H. Press,et al.  Numerical Recipes in C The Art of Scientific Computing , 1995 .

[31]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[32]  R. Hoppe Multi-grid methods for Hamilton-Jacobi-Bellman equations , 1986 .

[33]  R. H. Myers Classical and modern regression with applications , 1986 .

[34]  R. Farwig Multivariate interpolation of scattered data by moving least squares methods , 1987 .

[35]  H. Müller Weighted Local Regression and Kernel Methods for Nonparametric Curve Fitting , 1987 .

[36]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[37]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[38]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[39]  David L. Waltz,et al.  Applications of the Connection Machine , 1990, Computer.

[40]  W. Cleveland,et al.  Regression by local fitting: Methods, properties, and computational algorithms , 1988 .

[41]  H. Müller Nonparametric regression analysis of longitudinal data , 1988 .

[42]  David J. Reinkensmeyer,et al.  Task-level robot learning , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[43]  Christopher G. Atkeson,et al.  Model-Based Control of a Robot Manipulator , 1988 .

[44]  Barak A. Pearlmutter,et al.  Using a neural network to learn the dynamics of the CMU Direct-Drive Arm II , 1988 .

[45]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[46]  B. Yandell Spline smoothing and nonparametric regression , 1989 .

[47]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[48]  Christopher G. Atkeson,et al.  Task-level robot learning: juggling a tennis ball more accurately , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[49]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[50]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[51]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[52]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[53]  Alan D. Christiansen,et al.  Learning reliable manipulation strategies without initial physical models , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[54]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[55]  Robert R. Bitmead,et al.  Adaptive optimal control , 1990 .

[56]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[57]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[58]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[59]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[60]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[61]  Andrew W. Moore,et al.  Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.

[62]  Andrew W. Moore,et al.  Knowledge of knowledge and intelligent experimentation for learning control , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[63]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[64]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[65]  Jianqing Fan,et al.  Variable Bandwidth and Local Linear Regression Smoothers , 1992 .

[66]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[67]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[68]  Daniel E. Koditschek,et al.  Distributed real-time control of a spatial robot juggler , 1992, Computer.

[69]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[70]  Christopher G. Atkeson,et al.  What should be learned , 1992 .

[71]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[72]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[73]  Stefan Schaal,et al.  Learning passive motor control strategies with genetic algorithms , 1993 .

[74]  Stefan Schaal,et al.  Open loop stable control strategies for robot juggling , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[75]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[76]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[77]  Eduard Aved’yan,et al.  The Cerebellar Model Articulation Controller (CMAC) , 1995 .