Learning constraints from demonstrations with grid and parametric representations

We extend the learning from demonstration paradigm by providing a method for learning unknown constraints shared across tasks, using demonstrations of the tasks, their cost functions, and knowledge of the system dynamics and control constraints. Given safe demonstrations, our method uses hit-and-run sampling to obtain lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a consistent representation of the unsafe set via solving an integer program. Our method generalizes across system dynamics and learns a guaranteed subset of the constraint. We also provide theoretical analysis on what subset of the constraint can be learnable from safe demonstrations. We demonstrate our method on linear and nonlinear system dynamics, show that it can be modified to work with suboptimal demonstrations, and that it can also be used to learn constraints in a feature space.

[1]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[2]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[3]  Yoshihiko Nakamura,et al.  Learning Robot Skills Through Motion Segmentation and Constraints Extraction , 2013 .

[4]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[5]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[6]  Anca D. Dragan,et al.  Inferring and assisting with constraints in shared autonomy , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[7]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[8]  Peter L. Bartlett,et al.  Hit-and-Run for Sampling and Planning in Non-Convex Spaces , 2016, AISTATS.

[9]  J. March Introduction to the Calculus of Variations , 1999 .

[10]  Dmitry Berenson,et al.  Learning Object Orientation Constraints and Guiding Constraints for Narrow Passages from One Demonstration , 2016, ISER.

[11]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[12]  T. Morin Monotonicity and the principle of optimality , 1982 .

[13]  Fabrizio Granelli,et al.  Guest editors' introduction: Special issue on modeling and simulation of cross-layer interactions in communication networks , 2010, TOMC.

[14]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[15]  Zhishen Wu,et al.  x$%x#%x , 2015 .

[16]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[17]  Nan Jiang,et al.  Repeated Inverse Reinforcement Learning , 2017, NIPS.

[18]  Aude Billard,et al.  A probabilistic Programming by Demonstration framework handling constraints in joint space and task space , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[20]  Peter Englert,et al.  Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations , 2017, ISRR.

[21]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[22]  Leopoldo Armesto,et al.  Efficient learning of constraints and generic null space policies , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[24]  Robert L. Smith,et al.  An analysis of a variation of hit-and-run for uniform sampling from general regions , 2011, TOMC.

[25]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[26]  Stephen P. Boyd,et al.  Imputing a convex objective function , 2011, 2011 IEEE International Symposium on Intelligent Control.

[27]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[28]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[29]  Ron Alterovitz,et al.  Demonstration-Guided Motion Planning , 2011, ISRR.

[30]  Ross A. Knepper,et al.  Implicit Communication in a Joint Action , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[31]  Julie A. Shah,et al.  C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  G. Allaire,et al.  Thickness control in structural optimization via a level set method , 2016, Structural and Multidisciplinary Optimization.

[33]  P. Olver Nonlinear Systems , 2013 .