Learning From Demonstrations in Changing Environments: Learning Cost Functions and Constraints for Motion Planning

We address the problem of performing complex tasks for a robot operating in changing environments. We propose two approaches to the following problem: 1) define taskspecific cost functions for motion planning that represent path quality by learning from an expert’s preferences and 2) using constraint-based representation of the task inside learning from demonstration paradigm. In the first approach, we generate a set of paths for a given task using a motion planner and collect data about their features (path length, distance from obstacles, etc.). We provide these paths to an expert as a set of pairwise comparisons. We then form a ranking of the paths from the expert’s comparisons. This ranking is used as training data for learning algorithms, which attempt to produce a cost function that maps path feature values to a cost that is consistent with the expert’s ranking. We test our method on two simulated car-maintenance tasks with the PR2 robot: removing a tire and extracting an oil filter. We found that learning methods which produce non-linear combinations of the features are better able to capture expert preferences for the tasks than methods which produce linear combinations. This result suggests that the linear combinations used in previous work on this topic may be too simple to capture the preferences of experts for complex tasks. In the second approach, we propose to introduce a constraint-based description of the task that can be used together with the motion planner to produce the trajectories. The description is automatically created from the demonstration by performing segmentation and extracting constraints from the motion. The constraints are represented with the Task Space Regions (TSR) that are extracted from the demonstration and used to produce a desired motion. To account for the parts of the motion where constraints are different a segmentation of the demonstrated motion is performed using TSRs. The proposed approach allows performing tasks on robot from human demonstration in changing environments, where obstacle distribution or poses of the objects could change between demonstration and execution. The experimental evaluation on two example motions was performed to estimate the ability of our approach to produce the desired motion and recover a demonstrated trajectory.

[1]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[2]  Rüdiger Dillmann,et al.  Incremental Learning of Tasks From User Demonstrations, Past Experiences, and Vocal Comments , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Christopher G. Atkeson,et al.  An optimization approach to rough terrain locomotion , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Andrea Lockerd Thomaz,et al.  Tutelage and Collaboration for Humanoid Robots , 2004, Int. J. Humanoid Robotics.

[6]  Juan Cort Sampling-Based Path Planning on Configuration-Space Costmaps , 2010 .

[7]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[10]  Emanuel Todorov,et al.  Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[11]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..

[12]  S. Levinson,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[13]  Yoshihiko Nakamura,et al.  Learning Robot Skills Through Motion Segmentation and Constraints Extraction , 2013 .

[14]  Kostas E. Bekris,et al.  A study on the finite-time near-optimality properties of sampling-based motion planners , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Maxim Likhachev,et al.  Learning to Plan for Constrained Manipulation from Demonstrations , 2013, Robotics: Science and Systems.

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[18]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[19]  Siddhartha S. Srinivasa,et al.  Task Space Regions , 2011, Int. J. Robotics Res..

[20]  Nebojsa Jojic,et al.  Efficient Ranking from Pairwise Comparisons , 2013, ICML.

[21]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[22]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[23]  Siddhartha S. Srinivasa,et al.  Manipulation planning on constraint manifolds , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[25]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[26]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[27]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[28]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[30]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.