SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces

Online learning from demonstration algorithms such as DAgger can learn policies for problems where the system dynamics and the cost function are unknown. However they impose a burden on supervisors to respond to queries each time the robot encounters new states while executing its current best policy. The MMD-IL algorithm reduces supervisor burden by filtering queries with insufficient discrepancy in distribution and maintaining multiple policies. We introduce the SHIV algorithm (Svm-based reduction in Human InterVention), which converges to a single policy and reduces supervisor burden in non-stationary high dimensional state distributions. To facilitate scaling and outlier rejection, filtering is based on a measure of risk defined in terms of distance to an approximate level set boundary defined by a One Class support vector machine. We report on experiments in three contexts: 1) a driving simulator with a 27,936 dimensional visual feature space, 2) a push-grasping in clutter simulation with a 22 dimensional state space, and 3) physical surgical needle insertion with a 16 dimensional state space. Results suggest that SHIV can efficiently learn policies with up to 70% fewer queries that DAgger.

[1]  M. Ivimey Annual report , 1958, IRE Transactions on Engineering Writing and Speech.

[2]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[3]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[4]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[5]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[6]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[11]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[12]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[13]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[14]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[15]  Larry A. Wasserman,et al.  Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo , 2007, AISTATS.

[16]  Ross D. King,et al.  Active Learning for Regression Based on Query by Committee , 2007, IDEAL.

[17]  Sebastian Thrun,et al.  Apprenticeship learning for motion planning with application to parking lot navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[19]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[20]  Pieter Abbeel,et al.  Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[22]  Robert E. Kass,et al.  Importance sampling: a review , 2010 .

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Alan Fern,et al.  Active Imitation Learning via State Queries , 2011 .

[25]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[26]  Thomas G. Dietterich,et al.  Active Imitation Learning via Reduction to I.I.D. Active Learning , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[27]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[28]  Felix Duvallet,et al.  Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[29]  Joelle Pineau,et al.  Maximum Mean Discrepancy Imitation Learning , 2013, Robotics: Science and Systems.

[30]  Gang Hua,et al.  Unsupervised One-Class Learning for Automatic Outlier Removal , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[32]  Siddhartha S. Srinivasa,et al.  Nonprehensile whole arm rearrangement planning on physics manifolds , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Murat Cenk Cavusoglu,et al.  Optimal needle grasp selection for automatic execution of suturing tasks in robotic minimally invasive surgery , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..