Behaviour generation in humanoids by learning potential-based policies from constrained motion

Movement generation that is consistent with observed or demonstrated behaviour is an efficient way to seed movement planning in complex, high-dimensional movement systems like humanoid robots. We present a method for learning potential-based policies from constrained motion data. In contrast to previous approaches to direct policy learning, our method can combine observations from a variety of contexts where different constraints are in force, to learn the underlying unconstrained policy in form of its potential function. This allows us to generalise and predict behaviour where novel constraints apply. We demonstrate our approach on systems of varying complexity, including kinematic data from the ASIMO humanoid robot with 22 degrees of freedom.

[1]  Donald E. Kirk,et al.  Optimal Control Theory , 1970 .

[2]  A. Liegeois,et al.  Automatic supervisory control of the configuration and behavior of multi-body mechanisms , 1977 .

[3]  Tsuneo Yoshikawa,et al.  Manipulability of Robotic Mechanisms , 1985 .

[4]  O. Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[5]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[6]  Yoshihiko Nakamura,et al.  Advanced robotics - redundancy and optimization , 1990 .

[7]  Daniel E. Koditschek,et al.  Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[8]  Pradeep K. Khosla,et al.  Motion constraints from contact geometry: representation and analysis , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[9]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[10]  R. Kalaba,et al.  Inequality constraints in the process of jumping , 1996 .

[11]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[12]  Ferdinando A. Mussa-Ivaldi,et al.  Nonlinear force fields: a distributed system of control primitives for representing and learning movements , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[13]  Yuval Shahar,et al.  Utility Elicitation as a Classification Problem , 1998, UAI.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  Byung Kook Kim,et al.  Obstacle avoidance control for redundant manipulators using collidability measure , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[16]  Anthony A. Maciejewski,et al.  On the implementation of velocity control for kinematically redundant manipulators , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[17]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[18]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[19]  Daphne Koller,et al.  Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.

[20]  Éric Marchand,et al.  A redundancy-based iterative approach for avoiding joint limits: application to visual servoing , 2001, IEEE Trans. Robotics Autom..

[21]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[22]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[23]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[24]  Nikos A. Vlassis,et al.  Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[25]  Howie Choset,et al.  Composition of local potential functions for global robot control and navigation , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[26]  O. Khatib TASK-ORIENTED CONTROL OF HUMANOID ROBOTS THROUGH PRIORITIZATION , 2004 .

[27]  Zhiwei Luo,et al.  Optimal trajectory formation of constrained human arm reaching movements , 2004, Biological Cybernetics.

[28]  Konrad Paul Kording,et al.  A Neuroeconomics Approach to Inferring Utility Functions in Sensorimotor Control , 2004, PLoS biology.

[29]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[30]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[31]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Zhiwei Luo,et al.  An analysis of reaching movements in manipulation of constrained dynamic objects , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Oussama Khatib,et al.  Simulating the task-level control of human motion: a methodology and framework for implementation , 2005, The Visual Computer.

[34]  Oussama Khatib,et al.  Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[35]  Oussama Khatib,et al.  Control of Free-Floating Humanoid Robots Through Task Prioritization , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[36]  Michael Gienger,et al.  Task-oriented whole body motion for humanoid robots , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[39]  Gianluca Antonelli,et al.  TheNull-Space-Based Behavioral Control forMobile Robots , 2005 .

[40]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[41]  Minoru Asada,et al.  Learning humanoid motion dynamics through sensory-motor mapping in reduced dimensional spaces , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[42]  Jing Ren,et al.  Modified Newton's method applied to potential field-based navigation for mobile robots , 2006, IEEE Transactions on Robotics.

[43]  Katsu Yamane,et al.  Primitive communication based on motion recognition and generation with hierarchical mimesis model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[44]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[45]  Vincent De Sapio,et al.  Task-level approaches for the control of constrained multibody systems , 2006 .

[46]  Oussama Khatib,et al.  Contact consistent control framework for humanoid robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[47]  Sethu Vijayakumar,et al.  Learning Utility Surfaces for Movement Selection , 2006, 2006 IEEE International Conference on Robotics and Biomimetics.

[48]  Jakob J. Verbeek,et al.  Learning nonlinear image manifolds by global alignment of local linear models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Oussama Khatib,et al.  A whole-body control framework for humanoids operating in human environments , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[50]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[51]  Aude Billard,et al.  Learning of Gestures by Imitation in a Humanoid Robot , 2007 .

[52]  Michael Gienger,et al.  Real-time collision avoidance with whole body motion control for humanoid robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[54]  Thomas Hofmann,et al.  Learning Nonparametric Models for Probabilistic Imitation , 2007 .

[55]  Sethu Vijayakumar,et al.  Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators , 2007 .

[56]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[57]  Sethu Vijayakumar,et al.  Learning potential-based policies from constrained motion , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[58]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[59]  Jun Nakanishi,et al.  A unifying framework for robot control with redundant DOFs , 2007, Auton. Robots.