Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and the real world; transfer learning requires correspondences between different robotics environments. This paper aims to learn correspondence across domains differing in representation (vision vs. internal state), physics parameters (mass and friction), and morphology (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. We propose \textit{dynamics cycles} that align dynamic robot behavior across two domains using a cycle-consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains, both in simulation and on real robot. Our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data. Video demonstrations of our results are available at: this https URL .

[1]  Mohi Khansari,et al.  RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[3]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[4]  A. Meltzoff Understanding the Intentions of Others: Re-Enactment of Intended Acts by 18-Month-Old Children. , 1995, Developmental psychology.

[5]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[8]  S. Ermon,et al.  Cross Domain Imitation Learning , 2019, ArXiv.

[9]  Sergey Levine,et al.  Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[10]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[12]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[13]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[16]  Cordelia Schmid,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[17]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[18]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[19]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[20]  Pieter Abbeel,et al.  Learning to Manipulate Deformable Objects without Demonstrations , 2019, Robotics: Science and Systems.

[21]  Nicholas Roy,et al.  GeneSIS-Rt: Generating Synthetic Images for Training Secondary Real-World Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Stefano Ermon,et al.  Domain Adaptive Imitation Learning , 2020, ICML.

[23]  Girish Chowdhary,et al.  Cross-Domain Transfer in Reinforcement Learning Using Target Apprentice , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[28]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29]  Sergey Levine,et al.  AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos , 2020, Robotics: Science and Systems.

[30]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[34]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[35]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[37]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[38]  Slobodan Ilic,et al.  DeceptionNet: Network-Driven Domain Randomization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[40]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Pieter Abbeel,et al.  Learning Predictive Representations for Deformable Objects Using Contrastive Estimation , 2020, CoRL.

[42]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[43]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[44]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[45]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[46]  Silvio Savarese,et al.  Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation , 2020, CoRL.

[47]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[48]  Yaser Sheikh,et al.  Recycle-GAN: Unsupervised Video Retargeting , 2018, ECCV.

[49]  Eric Eaton,et al.  Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment , 2015, AAAI.