论文信息 - Regression via Kirszbraun Extension with Applications to Imitation Learning

Regression via Kirszbraun Extension with Applications to Imitation Learning

Learning by demonstration is a versatile and rapid mechanism for transferring motor skills from a teacher to a learner. A particular challenge in imitation learning is the so-called correspondence problem, which involves mapping actions between a teacher and a learner having substantially different embodiments (say, human to robot). We present a general, model free and non-parametric imitation learning algorithm based on regression between two Hilbert spaces. We accomplish this via Kirszbraun's extension theorem --- apparently the first application of this technique to supervised learning --- and analyze its statistical and computational aspects. We begin by formulating the correspondence problem in terms of quadratically constrained quadratic program (QCQP) regression. Then we describe a procedure for smoothing the training data, which amounts to regularizing hypothesis complexity via its Lipschitz constant. The Lipschitz constant is tuned via a Structural Risk Minimization (SRM) procedure, based on the covering-number risk bounds we derive. We apply our technique to a static posture imitation task between two robotic manipulators with different embodiments, and report promising results.

[1] Lee-Ad Gottlieb,et al. Adaptive metric dimensionality reduction , 2013, Theor. Comput. Sci..

[2] Lee-Ad Gottlieb,et al. Efficient Regression in Metric Spaces via Approximate Lipschitz Extension , 2011, IEEE Transactions on Information Theory.

[3] Sariel Har-Peled,et al. Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[4] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[5] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[6] Lee-Ad Gottlieb,et al. Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[7] Prateek Jain,et al. Alternating Minimization for Regression Problems with Vector-valued Outputs , 2015, NIPS.

[8] Mark Brudnak. Vector-Valued Support Vector Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9] Aude Billard,et al. Learning from Humans , 2016, Springer Handbook of Robotics, 2nd Ed..

[10] Assaf Naor,et al. Metric Embeddings and Lipschitz Extensions , 2017 .

[11] J. MacKinnon,et al. Estimation and inference in econometrics , 1994 .

[12] Ulrike von Luxburg,et al. Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[13] Chrystopher L. Nehaniv,et al. Imitation with ALICE: learning to imitate corresponding actions across dissimilar embodiments , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[14] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[15] Peter Englert,et al. Probabilistic model-based imitation learning , 2013, Adapt. Behav..

[16] E. J. McShane,et al. Extension of range of functions , 1934 .

[17] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[18] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[19] Gary L. Miller,et al. A fast solver for a class of linear systems , 2012, CACM.

[20] Chrystopher L. Nehaniv,et al. Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21] Concha Bielza,et al. A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[22] Oliver Kroemer,et al. Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks , 2017, Auton. Robots.

[23] Konstantin Makarychev,et al. Nonlinear dimension reduction via outer Bi-Lipschitz extensions , 2018, STOC.

[24] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[25] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[26] Rajesh P. N. Rao,et al. Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans , 2009, Creating Brain-Like Intelligence.

[27] Sunil Arya,et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[28] H. Whitney. Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .

[29] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[30] David Silver,et al. Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[31] Thomas Hofmann,et al. Learning Nonparametric Models for Probabilistic Imitation , 2007 .

[32] Darwin G. Caldwell,et al. An Approach for Imitation Learning on Riemannian Manifolds , 2017, IEEE Robotics and Automation Letters.

[33] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[34] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[35] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36] Arindam Banerjee,et al. An Improved Analysis of Alternating Minimization for Structured Multi-Response Regression , 2018, NeurIPS.

[37] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[38] Chrystopher L. Nehaniv,et al. Like Me?- Measures of Correspondence and Imitation , 2001, Cybern. Syst..

[39] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[40] Shang-Hua Teng,et al. Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs , 2010, STOC '11.

[41] M. D. Kirszbraun. Über die zusammenziehende und Lipschitzsche Transformationen , 1934 .

[42] K. Dautenhahn,et al. Do as I Do: Correspondences across Different Robotic Embodiments , 2002 .

[43] Rajesh P. N. Rao,et al. Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .

[44] Shang-Hua Teng,et al. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[45] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).