Behavior sequencing based on demonstrations: a case of a humanoid opening a door while walking

There is neuroscientific evidence to suggest that imitation between humans is goal-directed. Therefore, when performing multiple tasks, we internally define an unknown optimal policy to satisfy multiple goals. This work presents a method to transfer a complex behavior composed by a sequence of multiple tasks from a human demonstrator to a humanoid robot. We defined a multi-objective reward function as a measurement of the goal optimality for both human and robot, which is defined in each subtask of the global behavior. We optimize a sequential policy to generate whole-body movements for the robot that produces a reward profile which is compared and matched with the human reward profile, producing an imitative behavior. Furthermore, we can search in the proximity of the solution space to improve the reward profile and innovate a new solution, which is more beneficial for the humanoid. Experiments were carried out in a real humanoid robot. Graphical Abstract

[1]  Rodney A. Brooks,et al.  Humanoid robots , 2002, CACM.

[2]  Darwin G. Caldwell,et al.  Skills transfer across dissimilar robots by learning context-dependent rewards , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Jun Morimoto,et al.  Learning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks , 2008, Adv. Robotics.

[4]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[5]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[6]  Tetsuya Ogata,et al.  Experience Based Imitation Using RNNPB , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  C. S. George Lee,et al.  Whole-body human-to-humanoid motion transfer , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[8]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[9]  Siddhartha S. Srinivasa,et al.  Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[10]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[11]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[12]  Aude Billard,et al.  Learning Non-linear Multivariate Dynamics of Motion in Robotic Manipulators , 2011, Int. J. Robotics Res..

[13]  Carlos Balaguer,et al.  A humanoid robot standing up through learning from demonstration using a multimodal reward function , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[14]  Hsien-I Lin,et al.  Self-organizing skill synthesis , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Rüdiger Dillmann,et al.  Incremental Learning of Task Sequences with Information-Theoretic Metrics , 2006, EUROS.

[16]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[17]  Byoung-Tak Zhang,et al.  Online learning of a full body push recovery controller for omnidirectional walking , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[18]  Chrystopher L. Nehaniv,et al.  Imitation with ALICE: learning to imitate corresponding actions across dissimilar embodiments , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[19]  Dana Kulic,et al.  Comparative study of representations for segmentation of whole body human motion data , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[21]  Stefan Schaal,et al.  Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[22]  Mo Jamshidi,et al.  Intelligent Control Systems with an Introduction to System of Systems Engineering , 2009 .

[23]  Chrystopher L. Nehaniv,et al.  Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[24]  Sylvain Calino,et al.  Robot programming by demonstration : a probabilistic approach , 2009 .

[25]  Mo M. Jamshidi,et al.  Orchestration of Advanced Motor Skills in a Group of Humans through an Elitist Visual Feedback Mechanism , 2007, 2007 IEEE International Conference on System of Systems Engineering.

[26]  Aude Billard,et al.  Discriminative and adaptive imitation in uni-manual and bi-manual tasks , 2006, Robotics Auton. Syst..

[27]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[28]  Oussama Khatib,et al.  Compliant Control of Multicontact and Center-of-Mass Behaviors in Humanoid Robots , 2010, IEEE Transactions on Robotics.

[29]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[30]  Marvin Minsky,et al.  The emotion machine: from pain to suffering , 1999, Creativity & Cognition.

[31]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[32]  Abderrahmane Kheddar,et al.  Humanoid Robot Locomotion and Manipulation Step Planning , 2012, Adv. Robotics.

[33]  Kazuhito Yokoi,et al.  Biped walking pattern generation by using preview control of zero-moment point , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[34]  Minoru Asada,et al.  Humanoid Robot Motion Recognition and Reproduction , 2009, Adv. Robotics.

[35]  Carlos Balaguer,et al.  Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation , 2014, Int. J. Humanoid Robotics.

[36]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[37]  Dana Kulic,et al.  Incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains , 2008, Int. J. Robotics Res..

[38]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[39]  Jean-Paul Laumond,et al.  From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[40]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[41]  Aude Billard,et al.  BM: An iterative algorithm to learn stable non-linear dynamical systems with Gaussian mixture models , 2010, 2010 IEEE International Conference on Robotics and Automation.

[42]  Stefano Caselli,et al.  Learning Manipulation Tasks from Human Demonstration and 3D Shape Segmentation , 2012, Adv. Robotics.

[43]  Aude Billard,et al.  Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[44]  Jan Peters,et al.  Learning concurrent motor skills in versatile solution spaces , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[46]  Darwin G. Caldwell,et al.  Human-robot skills transfer interfaces for a flexible surgical robot , 2014, Comput. Methods Programs Biomed..

[47]  Jun Morimoto,et al.  Extraction of primitive representation from captured human movements and measured ground reaction force to generate physically consistent imitated behaviors , 2013, Neural Networks.

[48]  A. Whiten,et al.  Where culture takes hold: "overimitation" and its flexible deployment in Western, Aboriginal, and Bushmen children. , 2014, Child development.

[49]  Tsukasa Ogasawara,et al.  Reinforcement learning for balancer embedded humanoid locomotion , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[50]  Sylvain Calinon,et al.  Robot Programming by Demonstration - a Probabilistic Approach , 2009 .

[51]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[52]  Jochen J. Steil,et al.  Automatic selection of task spaces for imitation learning , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  A. Billard,et al.  Learning the Nonlinear Multivariate Dynamics of Motion of Robotic Manipulators , 2009 .

[54]  Lydia M. Hopper,et al.  Emulation, imitation, over-imitation and the scope of culture for child and chimpanzee , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[55]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.