Mobile Robot Motion Control from Demonstration and Corrective Feedback

Robust motion control algorithms are fundamental to the successful, autonomous operation of mobile robots. Motion control is known to be a difficult problem, and is often dictated by a policy, or state-action mapping. In this chapter, we present an approach for the refinement of mobile robot motion control policies, that incorporates corrective feedback from a human teacher. The target application domain of this work is the low-level motion control of a mobile robot. Within such domains, the rapid sampling rate and continuous action space of policies are both key challenges to providing policy corrections. To address these challenges, we contribute advice-operators as a corrective feedback form suitable for providing continuous-valued corrections, and Focused Feedback For Mobile Robot Policies (F3MRP) as a framework suitable for providing feedback on policies sampled at a high frequency. Under our approach, policies refined through teacher feedback are initially derived using Learning from Demonstration (LfD) techniques, which generalize a policy from example task executions by a teacher. We apply our techniques within the Advice-Operator Policy Improvement (A-OPI) algorithm, validated on a Segway RMP robot within a motion control domain. A-OPI refines LfD policies by correcting policy performance via advice-operators and F3MRP. Within our validation domain, policy performance is found to improve with corrective teacher feedback, and moreover to be similar or superior to that of policies provided with more teacher demonstrations.

[1]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[2]  Z. S. Saad,et al.  A COMPARISON BETWEEN SURFACE AND VOLUME-BASED AVERAGING TECHNIQUES FOR CROSS-SUBJECT FMRI ANALYSIS , 2005 .

[3]  John E. Laird,et al.  Learning procedural knowledge through observation , 2001, K-CAP '01.

[4]  S. Schaal,et al.  Programmable Pattern Generators , 1998 .

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[7]  Aude Billard,et al.  Tactile Correction and Multiple Training Data Sources for Robot Motion Control , 2009, NIPS 2009.

[8]  Erik Alm,et al.  The correspondence problem for metabonomics datasets , 2009, Analytical and bioanalytical chemistry.

[9]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[10]  Rajesh P. N. Rao,et al.  Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .

[11]  Tetsunari Inamura Masayuki Inaba Hirochika Acquisition of Probabilistic Behavior Decision Model based on the Interactive Teaching Method , 2001 .

[12]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[14]  José Santos-Victor,et al.  Abstraction Levels for Robotic Imitation: Overview and Computational Approaches , 2010, From Motor Learning to Interaction Learning in Robots.

[15]  Brenna Argall,et al.  A Rigorous Treatment of a Follow-the-Leader Traffic Model with Traffic Lights Present , 2002, SIAM J. Appl. Math..

[16]  Maja J. Matarić,et al.  Primitive-Based Movement Classification for Humanoid Imitation , 2000 .

[17]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[18]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[19]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[20]  Aude Billard,et al.  Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[21]  José Santos-Victor,et al.  Visual learning by imitation with motor representations , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  H. Friedrich,et al.  In: Probramming by Demonstration vs. Learning from Examples Workshop at Ml'95 Obtaining Good Performance from a Bad Teacher , 1995 .

[23]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[24]  Eric L. Sauser,et al.  Tactile guidance for policy refinement and reuse , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[25]  Tony Belpaeme,et al.  A computational model of intention reading in imitation , 2006, Robotics Auton. Syst..

[26]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[27]  Aude Billard,et al.  Using reinforcement learning to adapt an imitation task , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Pradeep K. Khosla,et al.  A multi-agent system for programming robots by human demonstration , 2001, Integr. Comput. Aided Eng..

[29]  Guido Bugmann,et al.  Mobile robot programming using natural language , 2002, Robotics Auton. Syst..

[30]  Brett Browning,et al.  Learning Robot Motion Control from Demonstration and Human Advice , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[31]  Aude Billard,et al.  A survey of Tactile Human-Robot Interactions , 2010, Robotics Auton. Syst..

[32]  Reid G. Simmons,et al.  A task description language for robot control , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[33]  Andrea Lockerd Thomaz,et al.  Tutelage and socially guided robot learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[34]  B. Argall,et al.  Simplified intersubject averaging on the cortical surface using SUMA , 2006, Human brain mapping.

[35]  Jun Nakanishi,et al.  Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[37]  M. Stolle,et al.  Knowledge Transfer Using Local Features , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[38]  Yiannis Demiris,et al.  Imitation as a dual-route process featuring prediction and learning components: A biologically plaus , 2002 .

[39]  Aude Billard,et al.  Learning from Demonstration and Correction via Multiple Modalities for a Humanoid Robot , 2011 .

[40]  Andrea Lockerd Thomaz,et al.  Using perspective taking to learn from ambiguous demonstrations , 2006, Robotics Auton. Syst..

[41]  Claude Sammut,et al.  Learning to Fly , 1992, ML.

[42]  Hoa G. Nguyen,et al.  Segway robotic mobility platform , 2004, SPIE Optics East.

[43]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[44]  Aude Billard,et al.  Discriminative and adaptive imitation in uni-manual and bi-manual tasks , 2006, Robotics Auton. Syst..

[45]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[46]  Jessica K. Hodgins,et al.  Generalizing Demonstrated Manipulation Tasks , 2002, WAFR.

[47]  Stephen A. Billings,et al.  Robot programming by demonstration through system identification , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[48]  Mohammed Yeasin,et al.  Toward automatic robot programming: learning human skill from visual data , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[51]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[52]  Michael Happold,et al.  A Bayesian approach to imitation learning for robot navigation , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Monica N. Nicolescu,et al.  Learning and interacting in human-robot domains , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[54]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[55]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[56]  K. Dautenhahn,et al.  Towards robot cultures?: Learning to imitate in a robotic arm test-bed with dissimilarly embodied agents , 2004 .

[57]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[58]  Brett Browning,et al.  Learning to Select State Machines using Expert Advice on an Autonomous Robot , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[59]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[60]  M. Veloso,et al.  The first segway soccer experience: towards peer-to-peer human-robot teams , 2006, HRI '06.

[61]  Brett Browning,et al.  Dynamically formed heterogeneous robot teams performing tightly-coordinated tasks , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[62]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[63]  Stefano Caselli,et al.  Robust trajectory learning and approximation for robot programming by demonstration , 2006, Robotics Auton. Syst..

[64]  Helge J. Ritter,et al.  Situated robot learning for multi-modal instruction and imitation of grasping , 2004, Robotics Auton. Syst..

[65]  Manuela M. Veloso,et al.  Learning equivalent action choices from demonstration , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[66]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[67]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[68]  Luís Nunes,et al.  On Learning by Exchanging Advice , 2002, ArXiv.

[69]  Gillian M. Hayes,et al.  Imitation as a dual-route process featuring predictive and learning components: a biologically plausible computational model , 2002 .

[70]  John W. Eaton,et al.  Gnu Octave Manual , 2002 .

[71]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[72]  Brett Browning,et al.  Automatic weight learning for multiple data sources when learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[73]  Cynthia Breazeal,et al.  Improvements on action parsing and action interpolation for learning through demonstration , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[74]  R. Amit,et al.  Learning movement sequences from demonstration , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[75]  Sethu Vijayakumar,et al.  Methods for Learning Control Policies from Variable-Constraint Demonstrations , 2010, From Motor Learning to Interaction Learning in Robots.

[76]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[77]  José María Valls,et al.  Correcting and improving imitation models of humans for Robosoccer agents , 2005, 2005 IEEE Congress on Evolutionary Computation.

[78]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[79]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[80]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[81]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[82]  K. Dautenhahn,et al.  The correspondence problem , 2002 .

[83]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[84]  Manuela M. Veloso,et al.  Multi-thresholded approach to demonstration selection for interactive robot learning , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[85]  Todd D. Murphey,et al.  Making Robotic Marionettes Perform , 2012 .

[86]  Paul E. Utgoff,et al.  On integrating apprentice learning and reinforcement learning , 1996 .

[87]  C. Breazeal,et al.  Robots that imitate humans , 2002, Trends in Cognitive Sciences.

[88]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[89]  Paul E. Rybski,et al.  Interactive task training of a mobile robot through human gesture recognition , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[90]  Masaki Ogino,et al.  Interaction rule learning with a human partner based on an imitation faculty with a simple visuo-motor mapping , 2006, Robotics Auton. Syst..

[91]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[92]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[93]  Roderic A. Grupen,et al.  A model of shared grasp affordances from demonstration , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[94]  B. Argall,et al.  Unraveling multisensory integration: patchy organization within human STS multisensory cortex , 2004, Nature Neuroscience.

[95]  Brett Browning,et al.  Learning Mobile Robot Motion Control from Demonstrated Primitives and Human Feedback , 2011, ISRR.

[96]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[97]  Dana H. Ballard,et al.  Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[98]  Brett Browning,et al.  Sliding Autonomy for Peer-To-Peer Human-Robot Teams , 2008 .

[99]  Brett Browning,et al.  Skill Acquisition and Use for a Dynamically-Balancing Soccer Robot , 2004, AAAI.

[100]  Chrystopher L. Nehaniv,et al.  Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[101]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[102]  Ignazio Infantino,et al.  A posture sequence learning system for an anthropomorphic robotic hand , 2004, Robotics Auton. Syst..

[103]  Daniel H. Grollman,et al.  Sparse incremental learning for interactive robot control policy estimation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[104]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[105]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[106]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[107]  Brenna D. Argall Information Extraction under Communication Constraints within Assistive Robot Domains , .

[108]  Alexander Zelinsky,et al.  Programing by Demonstration: Coping with Suboptimal Teaching Actions , 2003 .

[109]  Raymond T. Stefani,et al.  Design of feedback control systems , 1982 .

[110]  Dana Kulic,et al.  Incremental Learning of Full Body Motion Primitives , 2010, From Motor Learning to Interaction Learning in Robots.

[111]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[112]  Ignazio Infantino,et al.  A cognitive framework for imitation learning , 2006, Robotics Auton. Syst..

[113]  Chrystopher L. Nehaniv,et al.  Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics , 2002 .

[114]  Ales Ude,et al.  Programming full-body movements for humanoid robots by observation , 2004, Robotics Auton. Syst..

[115]  Nathan Delson,et al.  Robot programming by human demonstration: adaptation and inconsistency in constrained motion , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[116]  B. Argall,et al.  Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus , 2004, Neuron.

[117]  Monica N. Nicolescu,et al.  Experience-based representation construction: learning from human and robot teachers , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[118]  Brenna D. Argall Continuing Robot Skill Learning after Demonstration with Human Feedback , 2011 .

[119]  Geir Hovland,et al.  Skill acquisition from human demonstration using a hidden Markov model , 1996, Proceedings of IEEE International Conference on Robotics and Automation.