论文信息 - Learning to exploit dynamics for robot motor coordination

Learning to exploit dynamics for robot motor coordination

Humans exploit dynamics—gravity, inertia, joint coupling, elasticity, and so on—as a regular part of skillful, coordinated movements. Such movements comprise everyday activities, like reaching and walking, as well as highly practiced maneuvers as used in athletics and the performing arts. Robots, especially industrial manipulators, instead use control schemes that ordinarily cancel the complex, nonlinear dynamics that humans use to their advantage. Alternative schemes from the machine learning and intelligent control communities offer a number of potential benefits, such as improved efficiency, online skill acquisition, and tracking of nonstationary environments. However, the success of such methods depends a great deal on structure in the form of simplifying assumptions, prior knowledge, solution constraints and other heuristics that bias learning. My premise for this research is that crude kinematic information can supply the initial knowledge needed for learning complex robot motor skills—especially skills that exploit dynamics as humans do. This information is readily available from various sources such as a coach or human instructor, from theoretical analysis of a robot mechanism, or from conventional techniques for manipulator control. In this dissertation I investigate how each type of kinematic information can facilitate the learning of efficient “dynamic” skills. This research is multidisciplinary with contributions along several dimensions. With regard to biological motor control, I demonstrate that motor synergies, i.e. functional units that exploit dynamics, evolve when trial-and-error learning is applied to a particular model of motor skill acquisition. To analyze the effects of velocity on dynamic skills and motor learning, I derive an extension to the notion of dynamic manipulability that roboticists use to quantify a robot's capabilities before specification of a task. And along the machine learning dimension, I develop a supervised actor-critic architecture for learning a standard of correctness from a conventional controller while improving upon it through trial-and-error learning. Examples with both simulated and real manipulators demonstrate the benefits that this research holds for the development of skillful, coordinated robots.

Michael T. Rosenstein | Andrew G. Barto | A. Barto | M. Rosenstein

[1] J. Stevens,et al. Animal Intelligence , 1883, Nature.

[2] W. Smith. The Integrative Action of the Nervous System , 1907, Nature.

[3] M. Mead,et al. Cybernetics , 1953, The Yale Journal of Biology and Medicine.

[4] Sher ry Folsom-Meek,et al. Human Performance , 2020, Nature.

[5] J. Denavit,et al. A kinematic notation for lower pair mechanisms based on matrices , 1955 .

[6] E. Poulton. On prediction in skilled movements. , 1957, Psychological bulletin.

[7] Poulton Ec. On prediction in skilled movements. , 1957 .

[8] R. Pew. Acquisition of hierarchical control over the temporal organization of a skill. , 1966, Journal of experimental psychology.

[9] F. Huddle. Coordination , 1966, Open Knowledge Institutions.

[10] N. A. Bernshteĭn. The co-ordination and regulation of movements , 1967 .

[11] Paul Weiss,et al. SELF-DIFFERENTIATION OF THE BASIC PATTERNS OF COORDINATION , 1968 .

[12] Steven W. Keele,et al. Movement control in skilled motor performance. , 1968 .

[13] Donald E. Kirk,et al. Optimal control theory : an introduction , 1970 .

[14] J. Adams,et al. A closed-loop theory of motor learning. , 1971, Journal of motor behavior.

[15] T. Easton. On the normal use of reflexes. , 1972, American scientist.

[16] D. E. Whitney,et al. The mathematics of coordinated control of prosthetic arms and manipulators. , 1972 .

[17] Easton Ta. On the normal use of reflexes. , 1972 .

[18] Peter H. Greene,et al. Problems of Organization of Motor Systems , 1972 .

[19] D. Winfield,et al. Optimization: Theory and practice , 1972 .

[20] J. Meditch,et al. Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[21] T. H. I. Jaakola,et al. Optimization by direct search and systematic reduction of the size of search region , 1973 .

[22] R. Schmidt. A schema theory of discrete motor skill learning. , 1975 .

[23] D Goodman,et al. On the nature of human interlimb coordination. , 1979, Science.

[24] James S. Albus,et al. Brains, behavior, and robotics , 1981 .

[25] P. Viviani,et al. Trajectory determines movement dynamics , 1982, Neuroscience.

[26] David E. Orin,et al. Efficient Dynamic Computer Simulation of Robotic Mechanisms , 1982 .

[27] P. H. Greene,et al. Why is it easy to control your arms ? , 1982, Journal of motor behavior.

[28] C. Gallistel. The Organization of Action: A New Synthesis , 1982 .

[29] Timothy D. Lee,et al. Motor Control and Learning: A Behavioral Emphasis , 1982 .

[30] H. Asada,et al. A Geometrical Representation of Manipulator Dynamics and Its Application to Arm Design , 1983 .

[31] J. Y. S. Luh,et al. Conventional controller design for industrial robots — A tutorial , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[32] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[33] Tomás Lozano-Pérez,et al. Spatial Planning: A Configuration Space Approach , 1983, IEEE Transactions on Computers.

[34] John J. Craig,et al. Adaptive control of manipulators through repeated trials , 1984 .

[35] Suguru Arimoto,et al. Bettering operation of Robots by learning , 1984, J. Field Robotics.

[36] Thomas A. McMahon,et al. Muscles, Reflexes, and Locomotion , 1984 .

[37] Neville Hogan,et al. Impedance Control: An Approach to Manipulation , 1984, 1984 American Control Conference.

[38] John M. Hollerbach,et al. Redundancy resolution of manipulators through torque optimization , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[39] Jean-Jacques E. Slotine,et al. The Robust Control of Robot Manipulators , 1985 .

[40] John M. Hollerbach,et al. Planning a minimum-time trajectories for robot arms , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[41] Tsuneo Yoshikawa,et al. Dynamic manipulability of robot manipulators , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[42] Kang G. Shin,et al. Minimum-time control of robotic manipulators with geometric path constraints , 1985 .

[43] Michael Brady,et al. Artificial Intelligence and Robotics , 1985, Artif. Intell..

[44] A. G. Feldman. Once More on the Equilibrium-Point Hypothesis (λ Model) for Motor Control , 1986 .

[45] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[46] S. Shankar Sastry,et al. Adaptive Control of Mechanical Manipulators , 1987, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[47] Christopher G. Atkeson,et al. Robot trajectory learning through practice , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[48] Francis L. Merat,et al. Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[49] J. Kehne. The Neural Basis of Motor Control , 1987, The Yale Journal of Biology and Medicine.

[50] Bruce H. Krogh,et al. The acceleration radius: a global performance measure for robotic manipulators , 1988, IEEE J. Robotics Autom..

[51] Stephen L. Chiu,et al. Task Compatibility of Manipulator Postures , 1988, Int. J. Robotics Res..

[52] J. Slotine. Putting physics in control-the example of robotics , 1988, IEEE Control Systems Magazine.

[53] 宇野洋二,et al. Formation and control of optimal trajectory in human multijoint arm movement : minimum torque-change model , 1988 .

[54] R. Emmerik,et al. The effects of practice on limb kinematics in a throwing task. , 1989, Journal of motor behavior.

[55] Romeo Ortega,et al. Adaptive motion control of rigid robots: a tutorial , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[56] H. Harry Asada,et al. Automatic program generation from teaching data for the hybrid control of robots , 1989, IEEE Trans. Robotics Autom..

[57] Mark W. Spong,et al. Robot dynamics and control , 1989 .

[58] K. Kreutz. On manipulator control by exact linearization , 1989 .

[59] Pasquale Chiacchio. Exploiting Redundancy in Minimum-Time Path Following Robot Control , 1990, 1990 American Control Conference.

[60] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[61] Andrew G. Barto,et al. Connectionist learning for control , 1990 .

[62] P. Dorato,et al. Survey of robust control for rigid robots , 1991, IEEE Control Systems.

[63] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[64] N. Hogan,et al. Does the nervous system use equilibrium-point control to guide single and multiple joint movements? , 1992, The Behavioral and brain sciences.

[65] B. Bril,et al. Postural requirements and progression velocity in young walkers. , 1992, Journal of motor behavior.

[66] R. V. Emmerik,et al. Kinematic adaptations to perturbations as a function of practice in rhythmic drawing movements. , 1992 .

[67] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[68] W.S. Levine,et al. The neural control of limb movement , 1992, IEEE Control Systems.

[69] Souran Manoochehri,et al. Optimal Path Planning for Robot Manipulators , 1992 .

[70] B. Vereijken,et al. Free(z)ing Degrees of Freedom in Skill Acquisition , 1992 .

[71] B. Siciliano,et al. Influence of gravity on the manipulability ellipsoid for robot arms , 1992 .

[72] Mo Jamshidi,et al. Learning control of robot manipulators , 1992 .

[73] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.

[74] R. E. van Emmerick. Kinematic adaptations to perturbations as a function of practice in rhythmic drawing movements. , 1992, Journal of motor behavior.

[75] Russell W. Anderson. Biased Random-Walk Learning: A Neurobiological Correlate to Trial-and-Error , 1993, adap-org/9305002.

[76] Mitsuo Kawato,et al. Neural network control for a closed-loop System using Feedback-error-learning , 1993, Neural Networks.

[77] Frank L. Lewis,et al. Control of Robot Manipulators , 1993 .

[78] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[79] P. Antsaklis. Intelligent control , 1986, IEEE Control Systems Magazine.

[80] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[81] Masayuki Inaba,et al. Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[82] Jude W. Shavlik,et al. Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[83] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .

[84] Kumpati S. Narendra,et al. Adaptation and learning using multiple models, switching, and tuning , 1995 .

[85] G. J. van Ingen Schenau,et al. The control of multi-joint movements relies on detailed internal representations , 1995 .

[86] C. Melchiorri,et al. Robot manipulability , 1995, IEEE Trans. Robotics Autom..

[87] M G Pandy,et al. Optimal control of non-ballistic muscular movements: a constraint-based performance criterion for rising from a chair. , 1995, Journal of biomechanical engineering.

[88] Richard S. Sutton,et al. Connectionist Learning for Control , 1995 .

[89] S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[90] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[91] Paul E. Utgoff,et al. On integrating apprentice learning and reinforcement learning , 1996 .

[92] N. A. Bernstein. Dexterity and Its Development , 1996 .

[93] Rüdiger Dillmann,et al. Building elementary robot skills from human demonstration , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[94] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[95] Tirthankar Raychaudhuri,et al. From conventional control to autonomous intelligent methods , 1996 .

[96] Kumpati S. Narendra,et al. Adaptive Control of Robotic Manipulators Using Multiple Models and Switching , 1996, Int. J. Robotics Res..

[97] Ganwen Zeng,et al. An overview of robot force control , 1997, Robotica.

[98] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[99] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[100] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[101] Claude Alain,et al. A Comparison of Two References for Using Knowledge of Performance in Learning a Motor Task. , 1997, Journal of motor behavior.

[102] Roderic A. Grupen,et al. A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[103] Peter Stone,et al. Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.

[104] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[105] S. Shankar Sastry,et al. Learning control of complex skills , 1998 .

[106] B. Bril,et al. The build-up of anticipatory behaviour An analysis of the development of gait initiation in children , 1998, Experimental Brain Research.

[107] Feng Lin,et al. An optimal control approach to robust control of robot manipulators , 1998, IEEE Trans. Robotics Autom..

[108] Ronald C. Arkin,et al. An Behavior-based Robotics , 1998 .

[109] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[110] R. Kohl,et al. (Non)Issues of infinite regress in modeling motor behavior. , 1998, Journal of motor behavior.

[111] Neville Hogan,et al. Optimization principles in motor control , 1998 .

[112] Andrew G. Barto,et al. Reinforcement learning in motor control , 1998 .

[113] E. Gat. On Three-Layer Architectures , 1997 .

[114] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[115] Oussama Khatib,et al. The motion isotropy hypersurface: a characterization of acceleration capability , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[116] D M Wolpert,et al. Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[117] M. Latash,et al. Force sharing among fingers as a model of the redundancy problem , 1998, Experimental Brain Research.

[118] R. Byrne,et al. Priming primates: Human and otherwise , 1998, Behavioral and Brain Sciences.

[119] Matthew M. Williamson,et al. Neural control of rhythmic arm movements , 1998, Neural Networks.

[120] Etienne Burdet,et al. Experimental evaluation of nonlinear adaptive controllers , 1998 .

[121] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[122] N. Berthier,et al. Proximodistal structure of early reaching in human infants , 1999, Experimental Brain Research.

[123] James E. Bobrow,et al. Weight lifting motion planning for a Puma 762 robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[124] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[125] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[126] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[127] Craig Boutilier,et al. Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[128] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[129] James C. Houk,et al. A Cerebellar Model of Timing and Prediction in the Control of Reaching , 1999, Neural Computation.

[130] Mark L. Latash,et al. Learning a pointing task with a kinematically redundant limb: Emerging synergies and patterns of final position variability , 1999 .

[131] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[132] M. de Mathelin,et al. Robust control of robot manipulators: A survey , 1999 .

[133] Andrew G. Barto,et al. Combining Reinforcement Learning with a Local Control Algorithm , 2000, ICML.

[134] Charles W. Anderson,et al. Approximating a Policy Can be Easier Than Approximating a Value Function , 2000 .

[135] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[136] Richard A. Volz,et al. Acquiring robust, force-based assembly skills from human demonstration , 2000, IEEE Trans. Robotics Autom..

[137] Pasquale Chiacchio,et al. A new dynamic manipulability ellipsoid for redundant manipulators , 2000, Robotica.

[138] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[139] O. Meijer,et al. Bernstein's Theory of Movement Behavior: Historical Development and Contemporary Relevance , 2000, Journal of motor behavior.

[140] Scott T. Grafton,et al. Forward modeling allows feedback control for fast reaching movements , 2000, Trends in Cognitive Sciences.

[141] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[142] Douglas C. Hittle,et al. Robust reinforcement learning control with static and dynamic stability , 2001 .

[143] James E. Bobrow,et al. Payload maximization for open chained manipulators: finding weightlifting motions for a Puma 762 robot , 2001, IEEE Trans. Robotics Autom..

[144] Shinya Kotosaka,et al. Submitted to: IEEE International Conference on Humanoid Robotics Nonlinear Dynamical Systems as Movement Primitives , 2022 .

[145] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[146] Sascha E. Engelbrecht,et al. Minimum Principles in Motor Control. , 2001, Journal of mathematical psychology.

[147] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[148] Michael I. Jordan,et al. Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[149] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[150] Maja J. Matarić,et al. Sensory-motor primitives as a basis for imitation: linking perception to action and biology to robotics , 2002 .

[151] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[152] Xi-Ren Cao,et al. From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[153] M. Kawato,et al. Formation and control of optimal trajectory in human multijoint arm movement , 1989, Biological Cybernetics.

[154] Ferdinando A. Mussa-Ivaldi,et al. Vector field approximation: a computational paradigm for motor control and learning , 1992, Biological Cybernetics.

[155] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[156] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[157] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[158] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[159] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[160] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.