Methods for Learning Control Policies from Variable-Constraint Demonstrations

Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the task or the environment. Constraints are usually not observable and frequently change between contexts. In this chapter, we explore the problem of learning control policies from data containing variable, dynamic and non-linear constraints on motion. We discuss how an effective approach for doing this is to learn the unconstrained policy in a way that is consistent with the constraints. We then go on to discuss several recent algorithms for extracting policies from movement data, where observations are recorded under variable, unknown constraints. We review a number of experiments testing the performance of these algorithms and demonstrating how the resultant policy models generalise over constraints allowing prediction of behaviour under unseen settings where new constraints apply.

[1]  Kazuhito Yokoi,et al.  Resolved momentum control: humanoid motion planning based on the linear and angular momentum , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[2]  Nikos A. Vlassis,et al.  Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[3]  Oussama Khatib,et al.  Simulating the task-level control of human motion: a methodology and framework for implementation , 2005, The Visual Computer.

[4]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  A. Liegeois,et al.  Automatic supervisory control of the configuration and behavior of multi-body mechanisms , 1977 .

[7]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[10]  Donald E. Kirk,et al.  Optimal Control Theory , 1970 .

[11]  Sethu Vijayakumar,et al.  Behaviour generation in humanoids by learning potential-based policies from constrained motion , 2008 .

[12]  O. Khatib TASK-ORIENTED CONTROL OF HUMANOID ROBOTS THROUGH PRIORITIZATION , 2004 .

[13]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[14]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[15]  Daniel E. Koditschek,et al.  Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[16]  Jing Ren,et al.  Modified Newton's method applied to potential field-based navigation for mobile robots , 2006, IEEE Transactions on Robotics.

[17]  Wolfram Burgard,et al.  Robotics: Science and Systems XV , 2010 .

[18]  Sethu Vijayakumar,et al.  Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators , 2007 .

[19]  Oussama Khatib,et al.  Control of Free-Floating Humanoid Robots Through Task Prioritization , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[20]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[21]  Byung Kook Kim,et al.  Obstacle avoidance control for redundant manipulators using collidability measure , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[22]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[23]  Mark Dunn,et al.  Visually Guided Whole Body Interaction , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[24]  Sethu Vijayakumar,et al.  A novel method for learning policies from constrained motion , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[26]  Éric Marchand,et al.  A redundancy-based iterative approach for avoiding joint limits: application to visual servoing , 2001, IEEE Trans. Robotics Autom..

[27]  Konrad Paul Kording,et al.  A Neuroeconomics Approach to Inferring Utility Functions in Sensorimotor Control , 2004, PLoS biology.

[28]  Giulio Sandini,et al.  Learning to Exploit Proximal Force Sensing: A Comparison Approach , 2010, From Motor Learning to Interaction Learning in Robots.

[29]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[30]  Anthony A. Maciejewski,et al.  On the implementation of velocity control for kinematically redundant manipulators , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[31]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[32]  Oussama Khatib,et al.  Contact consistent control framework for humanoid robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[33]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  Oussama Khatib,et al.  A whole-body control framework for humanoids operating in human environments , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[35]  Jun Nakanishi,et al.  A unifying framework for robot control with redundant DOFs , 2007, Auton. Robots.

[36]  Daphne Koller,et al.  Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.

[37]  Michael Gienger,et al.  Task-oriented whole body motion for humanoid robots , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[38]  Rajesh P. N. Rao,et al.  Learning Nonparametric Models for Probabilistic Imitation , 2006, NIPS.

[39]  Yuval Shahar,et al.  Utility Elicitation as a Classification Problem , 1998, UAI.

[40]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[41]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[42]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Jakob J. Verbeek,et al.  Learning nonlinear image manifolds by global alignment of local linear models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[45]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[46]  Vincent De Sapio,et al.  Task-level approaches for the control of constrained multibody systems , 2006 .

[47]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[48]  Michael Gienger,et al.  Real-time collision avoidance with whole body motion control for humanoid robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Sethu Vijayakumar,et al.  Learning potential-based policies from constrained motion , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[50]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[51]  S. Chiaverini,et al.  The null-space-based behavioral control for soccer-playing mobile robots , 2005, Proceedings, 2005 IEEE/ASME International Conference on Advanced Intelligent Mechatronics..

[52]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[53]  Tsuneo Yoshikawa,et al.  Manipulability of Robotic Mechanisms , 1985 .

[54]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[55]  Aude Billard,et al.  Learning of Gestures by Imitation in a Humanoid Robot , 2007 .

[56]  Oussama Khatib,et al.  Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[57]  Howie Choset,et al.  Composition of local potential functions for global robot control and navigation , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[58]  Yoshihiko Nakamura,et al.  Advanced robotics - redundancy and optimization , 1990 .

[59]  Katsu Yamane,et al.  Primitive communication based on motion recognition and generation with hierarchical mimesis model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[60]  Zhiwei Luo,et al.  Optimal trajectory formation of constrained human arm reaching movements , 2004, Biological Cybernetics.

[61]  Pradeep K. Khosla,et al.  Motion constraints from contact geometry: representation and analysis , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[62]  R. Kalaba,et al.  Analytical Dynamics: A New Approach , 1996 .

[63]  Sethu Vijayakumar,et al.  A novel method for learning policies from variable constraint data , 2009, Auton. Robots.

[64]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[65]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..