论文信息 - Learning control policies from constrained motion

Learning control policies from constrained motion

Many everyday human skills can be framed in terms of performi ng some task subject to constraints imposed by the task or the environment. Co nstraints are usually unobservable and frequently change between contexts. In this thesis, we explore the problem of learning control po licies from data containing variable, dynamic and non-linear constraints on mo tion. We show that an effective approach for doing this is to learn the unconstraine d policy in a way that is consistent with the constraints. We propose several novel algorithms for extracting these po licies from movement data, where observations are recorded under different cons trai t . Furthermore, we show that, by doing so, we are able to learn representations o f movement that generalise over constraints and can predict behaviour under new c onstraints. In our experiments, we test the algorithms on systems of vary ing size and complexity, and show that the novel approaches give significant impr ovements in performance compared with standard policy learning approaches that are naiv to the effect of constraints. Finally, we illustrate the utility of the approac hes for learning from human motion capture data and transferring behaviour to several r obotic platforms.

Matthew Howard | M. Howard

[1] Atsushi Imiya,et al. Corridor Navigation and Obstacle Avoidance using Visual Potential for Mobile Robot , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[2] Santosh S. Vempala,et al. On clusterings: Good, bad and spectral , 2004, JACM.

[3] Eduardo D. Sontag,et al. Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[4] Sethu Vijayakumar,et al. A novel method for learning policies from constrained motion , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5] Pedro U. Lima,et al. Inverse reinforcement learning with evaluation , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[6] Thomas G. Dietterich. Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[7] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[8] Tsuneo Yoshikawa,et al. Manipulability of Robotic Mechanisms , 1985 .

[9] Stefano Caselli,et al. Robust trajectory learning and approximation for robot programming by demonstration , 2006, Robotics Auton. Syst..

[10] Masayuki Inaba,et al. Intent imitation using wearable motion capturing system with on-line teaching of task attention , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[11] Christopher G. Atkeson,et al. Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[12] Jun Nakanishi,et al. Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[13] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[14] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[16] R. Kalaba,et al. Inequality constraints in the process of jumping , 1996 .

[17] Kari Pulli,et al. Style translation for human motion , 2005, SIGGRAPH 2005.

[18] Pieter Abbeel,et al. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[19] Kari Pulli,et al. Style Translation for Human Motion (Supplemental Material) , 2005 .

[20] Michael Gienger,et al. Real-time collision avoidance with whole body motion control for humanoid robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21] Tamim Asfour,et al. Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[22] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[23] Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24] Konrad Paul Körding,et al. The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[26] Oussama Khatib,et al. Gauss' principle and the dynamics of redundant and constrained manipulators , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[27] Sethu Vijayakumar,et al. Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators , 2007 .

[28] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .

[29] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[30] Aude Billard,et al. A probabilistic Programming by Demonstration framework handling constraints in joint space and task space , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[32] Jun Tani,et al. On the dynamics of robot exploration learning , 2002, Cognitive Systems Research.

[33] Aude Billard,et al. Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[34] Anthony A. Maciejewski,et al. On the implementation of velocity control for kinematically redundant manipulators , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[35] Yoshihiko Nakamura,et al. Acquiring Motion Elements for Bidirectional Computation of Motion Recognition and Generation , 2002, ISER.

[36] Stefan Schaal,et al. Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[37] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[38] Oussama Khatib,et al. A whole-body control framework for humanoids operating in human environments , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[39] Stefan Schaal,et al. Statistical Learning for Humanoid Robots , 2002, Auton. Robots.

[40] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[41] Vincent De Sapio,et al. Task-level approaches for the control of constrained multibody systems , 2006 .

[42] Sethu Vijayakumar,et al. Learning Utility Surfaces for Movement Selection , 2006, 2006 IEEE International Conference on Robotics and Biomimetics.

[43] Aaron Hertzmann,et al. Style-based inverse kinematics , 2004, SIGGRAPH 2004.

[44] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[45] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[46] R. Kalaba,et al. Analytical Dynamics: A New Approach , 1996 .

[47] Nikos A. Vlassis,et al. Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[48] Nathan Delson,et al. Robot programming by human demonstration: the use of human variation in identifying obstacle free trajectories , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[49] Oussama Khatib,et al. Simulating the task-level control of human motion: a methodology and framework for implementation , 2005, The Visual Computer.

[50] Yoshihiko Nakamura,et al. Mimesis Scheme using a Monocular Vision System on a Humanoid Robot , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[51] Jun Nakanishi,et al. A unifying framework for robot control with redundant DOFs , 2007, Auton. Robots.

[52] L. A. Pipes,et al. Mathematical Methods in the Physical Sciences. , 1967 .

[53] Masahiko Morita,et al. Recognition of Spatiotemporal Patterns by Nonmonotone Neural Networks , 1997, ICONIP.

[54] Stefan Schaal,et al. Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[55] Rajesh P. N. Rao,et al. Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[56] Ludovic Righetti,et al. Movement generation using dynamical systems : a humanoid robot performing a drumming task , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[57] Aude Billard,et al. Learning of Gestures by Imitation in a Humanoid Robot , 2007 .

[58] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[59] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[60] S. Buss. Introduction to Inverse Kinematics with Jacobian Transpose , Pseudoinverse and Damped Least Squares methods , 2004 .

[61] Oussama Khatib,et al. Contact consistent control framework for humanoid robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[62] Sethu Vijayakumar,et al. A novel method for learning policies from variable constraint data , 2009, Auton. Robots.

[63] Stefan Schaal,et al. Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[64] F. Udwadia. Optimal tracking control of nonlinear dynamical systems , 2008, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[65] Jun Morimoto,et al. Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[66] Zhiwei Luo,et al. Optimal trajectory formation of constrained human arm reaching movements , 2004, Biological Cybernetics.

[67] Frederic Maire,et al. Apprenticeship Learning for Initial Value Functions in Reinforcement Learning , 2005, IJCAI 2005.

[68] Pradeep K. Khosla,et al. Motion constraints from contact geometry: representation and analysis , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[69] Michael Gienger,et al. Task-oriented whole body motion for humanoid robots , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[70] Rajesh P. N. Rao,et al. Learning Nonparametric Models for Probabilistic Imitation , 2006, NIPS.

[71] Stefan Schaal,et al. http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[72] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[73] Andrew Y. Ng. Reinforcement Learning and Apprenticeship Learning for Robotic Control , 2006, Discovery Science.

[74] Mark Dunn,et al. Visually Guided Whole Body Interaction , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[75] Vincent De Sapio,et al. Human-Like Motion from Physiologically-Based Potential Energies , 2004 .

[76] Oussama Khatib,et al. A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[77] Katsushi Ikeuchi,et al. Generation of a task model by integrating multiple observations of human demonstrations , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[78] Daniel E. Koditschek,et al. Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[79] Auke Jan Ijspeert,et al. A connectionist central pattern generator for the aquatic and terrestrial gaits of a simulated salamander , 2001, Biological Cybernetics.

[80] Aude Billard,et al. Goal-Directed Imitation in a Humanoid Robot , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[81] Yoshihiko Nakamura,et al. Associative computational model of mirror neurons that connects missing link between behaviors and symbols , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82] F. Udwadia. A new perspective on the tracking control of nonlinear structural and mechanical systems , 2003, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[83] Éric Marchand,et al. A redundancy-based iterative approach for avoiding joint limits: application to visual servoing , 2001, IEEE Trans. Robotics Autom..

[84] Kazuhito Yokoi,et al. Resolved momentum control: humanoid motion planning based on the linear and angular momentum , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[85] David J. Fleet,et al. Gaussian Process Dynamical Models , 2005, NIPS.

[86] Daniel E. Whitney,et al. Resolved Motion Rate Control of Manipulators and Human Prostheses , 1969 .

[87] David R. Brillinger. Learning a Potential Function From a Trajectory , 2007, IEEE Signal Processing Letters.

[88] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[89] Jan C. Willems,et al. When is a linear system optimal , 2005 .

[90] Jun Nakanishi,et al. A unifying methodology for the control of robotic systems , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[91] Minoru Asada,et al. Learning humanoid motion dynamics through sensory-motor mapping in reduced dimensional spaces , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[92] Oussama Khatib,et al. Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[93] David J. Fleet,et al. 3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[94] Sethu Vijayakumar,et al. Learning potential-based policies from constrained motion , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[95] Ales Ude,et al. Trajectory generation from noisy positions of object features for teaching robot paths , 1993, Robotics Auton. Syst..

[96] Richard M. Murray,et al. A Mathematical Introduction to Robotic Manipulation , 1994 .

[97] J. Casti. On the general inverse problem of optimal control theory , 1980 .

[98] Howie Choset,et al. Composition of local potential functions for global robot control and navigation , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[99] Yoshihiko Nakamura,et al. Advanced robotics - redundancy and optimization , 1990 .

[100] Katsu Yamane,et al. Primitive communication based on motion recognition and generation with hierarchical mimesis model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[101] Masahiko Morita,et al. Memory and Learning of Sequential Patterns by Nonmonotone Neural Networks , 1996, Neural Networks.

[102] A. Liegeois,et al. Automatic supervisory control of the configuration and behavior of multi-body mechanisms , 1977 .

[103] Oussama Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1986 .

[104] Emanuel Todorov,et al. Optimal Control Theory , 2006 .

[105] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[106] Zhiwei Luo,et al. An analysis of reaching movements in manipulation of constrained dynamic objects , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[107] Nathan Delson,et al. Robot programming by human demonstration: the use of human inconsistency in improving 3D robot trajectories , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[108] Andrew Y. Ng. Reinforcement Learning and Apprenticeship Learning for Robotic Control , 2006, ALT.

[109] O. Khatib. TASK-ORIENTED CONTROL OF HUMANOID ROBOTS THROUGH PRIORITIZATION , 2004 .

[110] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .