Regression via Kirszbraun Extension with Applications to Imitation Learning

Learning by demonstration is a versatile and rapid mechanism for transferring motor skills from a teacher to a learner. A particular challenge in imitation learning is the so-called correspondence problem, which involves mapping actions between a teacher and a learner having substantially different embodiments (say, human to robot). We present a general, model free and non-parametric imitation learning algorithm based on regression between two Hilbert spaces. We accomplish this via Kirszbraun's extension theorem --- apparently the first application of this technique to supervised learning --- and analyze its statistical and computational aspects. We begin by formulating the correspondence problem in terms of quadratically constrained quadratic program (QCQP) regression. Then we describe a procedure for smoothing the training data, which amounts to regularizing hypothesis complexity via its Lipschitz constant. The Lipschitz constant is tuned via a Structural Risk Minimization (SRM) procedure, based on the covering-number risk bounds we derive. We apply our technique to a static posture imitation task between two robotic manipulators with different embodiments, and report promising results.

[1]  Lee-Ad Gottlieb,et al.  Adaptive metric dimensionality reduction , 2013, Theor. Comput. Sci..

[2]  Lee-Ad Gottlieb,et al.  Efficient Regression in Metric Spaces via Approximate Lipschitz Extension , 2011, IEEE Transactions on Information Theory.

[3]  Sariel Har-Peled,et al.  Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[4]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[5]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[6]  Lee-Ad Gottlieb,et al.  Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[7]  Prateek Jain,et al.  Alternating Minimization for Regression Problems with Vector-valued Outputs , 2015, NIPS.

[8]  Mark Brudnak Vector-Valued Support Vector Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9]  Aude Billard,et al.  Learning from Humans , 2016, Springer Handbook of Robotics, 2nd Ed..

[10]  Assaf Naor,et al.  Metric Embeddings and Lipschitz Extensions , 2017 .

[11]  J. MacKinnon,et al.  Estimation and inference in econometrics , 1994 .

[12]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[13]  Chrystopher L. Nehaniv,et al.  Imitation with ALICE: learning to imitate corresponding actions across dissimilar embodiments , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[14]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[15]  Peter Englert,et al.  Probabilistic model-based imitation learning , 2013, Adapt. Behav..

[16]  E. J. McShane,et al.  Extension of range of functions , 1934 .

[17]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[18]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[19]  Gary L. Miller,et al.  A fast solver for a class of linear systems , 2012, CACM.

[20]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Concha Bielza,et al.  A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[22]  Oliver Kroemer,et al.  Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks , 2017, Auton. Robots.

[23]  Konstantin Makarychev,et al.  Nonlinear dimension reduction via outer Bi-Lipschitz extensions , 2018, STOC.

[24]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[25]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[26]  Rajesh P. N. Rao,et al.  Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans , 2009, Creating Brain-Like Intelligence.

[27]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[28]  H. Whitney Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .

[29]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[30]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[31]  Thomas Hofmann,et al.  Learning Nonparametric Models for Probabilistic Imitation , 2007 .

[32]  Darwin G. Caldwell,et al.  An Approach for Imitation Learning on Riemannian Manifolds , 2017, IEEE Robotics and Automation Letters.

[33]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[34]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[35]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36]  Arindam Banerjee,et al.  An Improved Analysis of Alternating Minimization for Structured Multi-Response Regression , 2018, NeurIPS.

[37]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[38]  Chrystopher L. Nehaniv,et al.  Like Me?- Measures of Correspondence and Imitation , 2001, Cybern. Syst..

[39]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[40]  Shang-Hua Teng,et al.  Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs , 2010, STOC '11.

[41]  M. D. Kirszbraun Über die zusammenziehende und Lipschitzsche Transformationen , 1934 .

[42]  K. Dautenhahn,et al.  Do as I Do: Correspondences across Different Robotic Embodiments , 2002 .

[43]  Rajesh P. N. Rao,et al.  Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .

[44]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[45]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).