Learning preferences for manipulation tasks from online coactive feedback

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they are rather governed by the surrounding context of various objects and human interactions in the environment. We propose a coactive online learning framework for teaching preferences in contextually rich environments. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited than demonstrations of optimal trajectories. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We implement our algorithm on two high-degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings, household chores and grocery store checkout, and show that users are able to train the robot with just a few feedbacks (taking only a few minutes).

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  P. Abbeel,et al.  Identification and Representation of Homotopy Classes of Trajectories for Search-Based Path Planning in 3D , 2012 .

[3]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[4]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[5]  Wolfram Burgard,et al.  Learning to predict trajectories of cooperatively navigating agents , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Nathan Ratliff,et al.  Learning to search: structured prediction techniques for imitation learning , 2009 .

[7]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[8]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[9]  Maxim Likhachev,et al.  E-Graphs: Bootstrapping Planning with Experience Graphs , 2012, SOCS.

[10]  Seth Hutchinson,et al.  Using manipulability to bias sampling during the construction of probabilistic roadmaps , 2003, IEEE Trans. Robotics Autom..

[11]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[12]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[13]  Alonzo Kelly,et al.  Toward Optimal Sampling in the Space of Paths , 2007, ISRR.

[14]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[15]  B. Schneirdeman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[16]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[17]  Takeo Kanade,et al.  Automated Construction of Robotic Manipulation Programs , 2010 .

[18]  Juan Cort Sampling-Based Path Planning on Configuration-Space Costmaps , 2010 .

[19]  Julie Shah,et al.  Human-Robot Teaming using Shared Mental Models , 2012 .

[20]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[21]  Sven Horstmann,et al.  Towards interactive learning for manufacturing assistants , 2001, Proceedings 10th IEEE International Workshop on Robot and Human Interactive Communication. ROMAN 2001 (Cat. No.01TH8591).

[22]  Rachid Alami,et al.  A Human Aware Mobile Robot Motion Planner , 2007, IEEE Transactions on Robotics.

[23]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[24]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[25]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[26]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[27]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[28]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[29]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[30]  Anthony Stentz,et al.  Anytime RRTs , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[32]  Maxim Likhachev,et al.  Search-based planning for manipulation with motion primitives , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33]  Rachid Alami,et al.  Spatial reasoning for human robot interaction , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Siddhartha S. Srinivasa,et al.  CHOMP: Covariant Hamiltonian optimization for motion planning , 2013, Int. J. Robotics Res..

[35]  P. Abbeel,et al.  LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , 2011 .

[36]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[37]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Steven M. LaValle,et al.  Survivability: Measuring and ensuring path diversity , 2009, 2009 IEEE International Conference on Robotics and Automation.

[39]  Ashutosh Saxena,et al.  Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback , 2016, ISRR.

[40]  Thorsten Joachims,et al.  Learning Socially Optimal Information Systems from Egoistic Users , 2013, ECML/PKDD.

[41]  Matei T. Ciocarlie,et al.  Interactive Markers: 3-D User Interfaces for ROS Applications [ROS Topics] , 2011, IEEE Robotics Autom. Mag..

[42]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[43]  Tamim Asfour,et al.  Synthesizing object receiving motions of humanoid robots with human motion database , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44]  Siddhartha S. Srinivasa,et al.  Generating Legible Motion , 2013, Robotics: Science and Systems.

[45]  Maarten Sierhuis,et al.  Human-agent-robot teamwork , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[46]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[47]  S. LaValle,et al.  Randomized Kinodynamic Planning , 2001 .

[48]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[49]  Rachid Alami,et al.  Planning human-aware motions using a sampling-based costmap planner , 2011, 2011 IEEE International Conference on Robotics and Automation.

[50]  Yisong Yue,et al.  Learning Policies for Contextual Submodular Prediction , 2013, ICML.

[51]  W. Wong,et al.  On ψ-Learning , 2003 .

[52]  Stefanos Nikolaidis,et al.  Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[53]  A. Kazi,et al.  The MORPHA style guide for icon-based programming , 2002, Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication.

[54]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[55]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[56]  Florian Schmidt,et al.  Making planned paths look more human-like in humanoid robot manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[57]  Oussama Khatib,et al.  Grasping with application to an autonomous checkout robot , 2011, 2011 IEEE International Conference on Robotics and Automation.

[58]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[59]  Hannes Bleuler,et al.  Randomised Rough-Terrain Robot Motion Planning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[61]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[62]  Dmitry Berenson,et al.  A robot path planning framework that learns from experience , 2012, 2012 IEEE International Conference on Robotics and Automation.

[63]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[64]  Nicholas Roy,et al.  Contextual Sequence Prediction with Application to Control Library Optimization , 2013 .

[65]  J. Andrew Bagnell,et al.  Efficient high dimensional maximum entropy modeling via symmetric partition functions , 2012, NIPS.