Transferring Visuomotor Learning from Simulation to the Real World for Robotics Manipulation Tasks

Hand-eye coordination is a requirement for many manipulation tasks including grasping and reaching. However, accurate hand-eye coordination has shown to be especially difficult to achieve in complex robots like the iCub humanoid. In this work, we solve the hand-eye coordination task using a visuomotor deep neural network predictor that estimates the arm's joint configuration given a stereo image pair of the arm and the underlying head configuration. As there are various unavoidable sources of sensing error on the physical robot, we train the predictor on images obtained from simulation. The images from simulation were modified to look realistic using an image-to-image translation approach. In various experiments, we first show that the visuomotor predictor provides accurate joint estimates of the iCub's hand in simulation. We then show that the predictor can be used to obtain the systematic error of the robot's joint measurements on the physical iCub robot. We demonstrate that a calibrator can be designed to automatically compensate this error. Finally, we validate that this enables accurate reaching of objects while circumventing manual fine-calibration of the robot.

[1]  Martina Zambelli,et al.  Kinematic Structure Correspondences via Hypergraph Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[3]  Tao Xiang,et al.  Joint Semantic and Latent Attribute Modelling for Cross-Class Transfer Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Martina Zambelli,et al.  Learning Kinematic Structure Correspondences Using Multi-Order Similarities , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Jinhyung Kim,et al.  Predictive coding-based deep dynamic neural network for visuomotor learning , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[8]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Martina Zambelli,et al.  Online Multimodal Ensemble Learning Using Self-Learned Sensorimotor Representations , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[11]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[12]  Marco Antonelli,et al.  Implicit mapping of the peripersonal space of a humanoid robot , 2011, 2011 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB).

[13]  Fadi Dornaika,et al.  Hand-Eye Calibration , 1995, Int. J. Robotics Res..

[14]  Wisama Khalil,et al.  Model Identification , 2019, Springer Handbook of Robotics, 2nd Ed..

[15]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Giulio Sandini,et al.  The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[17]  Alexandre Bernardino,et al.  Online Body Schema Adaptation Based on Internal Mental Simulation and Multisensory Feedback , 2016, Front. Robot. AI.

[18]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[19]  Angelo Cangelosi,et al.  An open-source simulator for cognitive robotics research: the prototype of the iCub humanoid robot simulator , 2008, PerMIS.

[20]  R. Johansson,et al.  Eye–Hand Coordination in Object Manipulation , 2001, The Journal of Neuroscience.

[21]  Rafael Pérez y Pérez,et al.  Emergence of eye–hand coordination as a creative process in an artificial developmental agent , 2017, Adapt. Behav..

[22]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[24]  Peter I. Corke,et al.  Sim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter: Domain Randomization and Adaptation with Modular Networks , 2017, ArXiv.

[25]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[26]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[27]  Daniela Corbetta,et al.  Mapping the feel of the arm with the sight of the object: on the embodied origins of infant reaching , 2014, Front. Psychol..

[28]  Marco Antonelli,et al.  On-Line Learning of the Visuomotor Transformations on a Humanoid Robot , 2012, IAS.

[29]  Lorenzo Natale,et al.  Visual end-effector tracking using a 3D model-aided particle filter for humanoid robot platforms , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Gregory D. Hager,et al.  Robot hand-eye coordination based on stereo vision , 1995 .

[31]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Alessandro Roncone,et al.  3D stereo estimation and fully automated learning of eye-hand coordination in humanoid robots , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[33]  James J. Kuffner,et al.  Physically Based Grasp Quality Evaluation Under Pose Uncertainty , 2013, IEEE Transactions on Robotics.

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.