Robot Skill Learning Through Intelligent Experimentation

In robot skill learning the robot must obtain data for training by executing expensive practice trials and recording their results. The thesis is that the high cost of acquiring training data is the limiting factor in the performance of skill learners. Since the data is obtained from practice trials, it is important that the system make intelligent choices about what actions to attempt while practicing. In this dissertation we present several algorithms for intelligent experimentation in skill learning. .pp In open-loop skills the execution goal is presented and the controller must then choose all the control signals for the duration of the task. Learning is a high-dimensional search problem. The system must associate a sequence of actions with each commandable goal. We propose an algorithm that selects practice actions most likely to improve performance by making use of information gained on previous trials. On the problem of learning to throw a ball using a robot with a flexible link, the algorithm takes only 100 trials to find a ``whipping'''' motion for long throws. .pp Most closed loop learners improve their performance by gradient descent on a cost function. The main drawback of this method is convergence to non-optimal local minima. We introduce the concept of cooperation as a means of escaping these local minima. We assume the existence of several coaches that each improve some aspect of the controller''s performance. Switching training between coaches can help the controller avoid locally minimal solutions. On the task of curve tracing with an inverted pendulum the cooperative algorithm learns to track faster than with a traditional method. .pp In an integrated system with scarce sensor resources it is preferable to perform tasks without sensing. We observe that closed loop learning can function as an efficient search technique for open-loop control. Our system starts with closed loop learning. As it improves its ability to control the plant, it replaces sensor information with estimates. The result is a controller that tracks long segments of a reference curve open loop.

[1]  Katsuhiko Ogata,et al.  Discrete-time control systems , 1987 .

[2]  Stefan Schaal,et al.  Open loop stable control strategies for robot juggling , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[3]  Michael S. Branicky Task-level learning: experiments and extensions , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[4]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Pat Langley,et al.  Reactive and Automatic Behavior in Plan Execution , 1994, AIPS.

[7]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[8]  Luca Maria Gambardella,et al.  On the iterative learning control theory for robotic manipulators , 1988, IEEE J. Robotics Autom..

[9]  S. Lehman The Neural and Behavioural Organization of Goal‐Directed Movements , 1990, Neurology.

[10]  A. Rosenfeld,et al.  IEEE TRANSACTIONS ON SYSTEMS , MAN , AND CYBERNETICS , 2022 .

[11]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[12]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[13]  Daniel E. Koditschek,et al.  Further progress in robot juggling: the spatial two-juggle , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[14]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[15]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[16]  Masahiro Fujita,et al.  Motion planning and control for a robot performer , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  A.H. Haddad,et al.  Applied optimal estimation , 1976, Proceedings of the IEEE.

[19]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[20]  Jeff G. Schneider High Dimension Action Spaces in Robot Skill Learning , 1994, AAAI.

[21]  S. Kawamura,et al.  Robot Skill Learning And The Effects Of Basis Function Choice , 1992 .

[22]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[23]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[24]  Naresh K. Sinha,et al.  Modern Control Systems , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Mitsuo Kawato,et al.  Feedback-error-learning neural network for trajectory control of a robotic manipulator , 1988, Neural Networks.

[26]  Gregory D. Hager,et al.  Computational Methods for Task-directed Sensor Data Fusion and Sensor Planning , 1991, Int. J. Robotics Res..

[27]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[28]  Fumihito Arai,et al.  Trajectory control of flexible plate using neural network , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[29]  Jeff Schneider,et al.  Task level training signals for learning controllers , 1994, Proceedings of 1994 9th IEEE International Symposium on Intelligent Control.

[30]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[31]  Yangsheng Xu,et al.  Real-time implementation of neural network learning control of a flexible Space manipulator , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[32]  Y. Bar-Shalom Tracking and data association , 1988 .

[33]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[34]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[35]  Ming Tan,et al.  Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[36]  Jeff G. Schneider,et al.  Robot skill learning, basis functions, and control regimes , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[37]  Stephen H. Lane,et al.  Goal-directed encoding of task knowledge for robotic skill acquisition , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[38]  M. Raibert Analytical equations vs. table look-up for manipulation: A unifying concept , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[39]  M. Kawato,et al.  Hierarchical neural network model for voluntary movement with application to robotics , 1988, IEEE Control Systems Magazine.

[40]  Patrice Yvon Simard Learning state space dynamics in recurrent networks , 1991 .

[41]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[42]  M. Kuperstein,et al.  Implementation of an adaptive neural controller for sensory-motor coordination , 1989, International 1989 Joint Conference on Neural Networks.

[43]  John J. Craig,et al.  Adaptive control of manipulators through repeated trials , 1984 .

[44]  Michael Kuperstein Implementation of an Adaptive Visually-Guided Neural Controller for Single Postures , 1988, 1988 American Control Conference.

[45]  Stefan Schaal,et al.  Assessing the Quality of Learned Local Models , 1993, NIPS.

[46]  Francis L. Merat,et al.  Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[47]  Stephen H. Lane,et al.  Acquisition of Automatic Activity through Practice: Changes in Sensory Input , 1992, AAAI.

[48]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[49]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[50]  J T Kelly,et al.  Assessing quality. , 1988, JAMA.

[51]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[52]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[53]  Bartlett W. Mel Connectionist Robot Motion Planning: A Neurally-Inspired Approach to Visually-Guided Reaching , 1990 .

[54]  G. Miller,et al.  Cognitive science. , 1981, Science.

[55]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[56]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[57]  Jeff G. Schneider,et al.  Efficient search for robot skill learning: simulation and reality , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[58]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[59]  Suguru Arimoto,et al.  Realization of robot motion based on a learning method , 1988, IEEE Trans. Syst. Man Cybern..

[60]  D.A. Handelman,et al.  Theory and development of higher-order CMAC neural networks , 1992, IEEE Control Systems.

[61]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[62]  Marcos Salganicoff Learning and forgetting for perception-action: a projection pursuit and density adaptive approach , 1992 .

[63]  Christopher M. Brown,et al.  Parallel genetic algorithms on distributed-memory architectures , 1993 .