Learning Trajectory Preferences for Manipulators via Iterative Improvement

We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalizability of our algorithm on a variety of grocery checkout tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment.1

[1]  Dmitry Berenson,et al.  A robot path planning framework that learns from experience , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[3]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[4]  Thierry Siméon,et al.  The Stochastic Motion Roadmap: A Sampling Framework for Planning with Markov Motion Uncertainty , 2007, Robotics: Science and Systems.

[5]  Maxim Likhachev,et al.  Search-based planning for manipulation with motion primitives , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  Ron Alterovitz,et al.  Parallel sampling-based motion planning with superlinear speedup , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Tamim Asfour,et al.  Synthesizing object receiving motions of humanoid robots with human motion database , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[9]  Matei T. Ciocarlie,et al.  Interactive Markers: 3-D User Interfaces for ROS Applications [ROS Topics] , 2011, IEEE Robotics Autom. Mag..

[10]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[11]  Florian Schmidt,et al.  Making planned paths look more human-like in humanoid robot manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[12]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[13]  Siddhartha S. Srinivasa,et al.  Generating Legible Motion , 2013, Robotics: Science and Systems.

[14]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[15]  Steven M. LaValle,et al.  Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[16]  Nathan Ratliff,et al.  Learning to search: structured prediction techniques for imitation learning , 2009 .

[17]  Alonzo Kelly,et al.  Toward Optimal Sampling in the Space of Paths , 2007, ISRR.

[18]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[19]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[20]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[21]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[22]  Vijay Kumar,et al.  Identification and Representation of Homotopy Classes of Trajectories for Search-based Path Planning in 3D , 2011, Robotics: Science and Systems.

[23]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[24]  Maxim Likhachev,et al.  E-Graphs: Bootstrapping Planning with Experience Graphs , 2012, SOCS.

[25]  Rachid Alami,et al.  Planning human-aware motions using a sampling-based costmap planner , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[27]  Oussama Khatib,et al.  Grasping with application to an autonomous checkout robot , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[29]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[30]  J. Andrew Bagnell,et al.  Efficient high dimensional maximum entropy modeling via symmetric partition functions , 2012, NIPS.

[31]  Pieter Abbeel,et al.  LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , 2010, Int. J. Robotics Res..

[32]  Rachid Alami,et al.  Spatial reasoning for human robot interaction , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[34]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[35]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[36]  Martial Hebert,et al.  Contextual Sequence Prediction with Application to Control Library Optimization , 2012, Robotics: Science and Systems.

[37]  Yun Jiang,et al.  Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[40]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[41]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[42]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[43]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[44]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[45]  Rachid Alami,et al.  A Human Aware Mobile Robot Motion Planner , 2007, IEEE Transactions on Robotics.

[46]  Takeo Kanade,et al.  Automated Construction of Robotic Manipulation Programs , 2010 .

[47]  Steven M. LaValle,et al.  Survivability: Measuring and ensuring path diversity , 2009, 2009 IEEE International Conference on Robotics and Automation.

[48]  W. Wong,et al.  On ψ-Learning , 2003 .

[49]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..