Offline Learning of Counterfactual Perception as Prediction for Real-World Robotic Reinforcement Learning

We propose a method for offline learning of counterfactual predictions to address real world robotic reinforcement learning challenges. The proposed method encodes action-oriented visual observations as several "what if" questions learned offline from prior experience using reinforcement learning methods. These "what if" questions counterfactually predict how action-conditioned observation would evolve on multiple temporal scales if the agent were to stick to its current action. We show that combining these offline counterfactual predictions along with online in-situ observations (e.g. force feedback) allows efficient policy learning with only a sparse terminal (success/failure) reward. We argue that the learned predictions form an effective representation of the visual task, and guide the online exploration towards high-potential success interactions (e.g. contact-rich regions). Experiments were conducted in both simulation and real-world scenarios for evaluation. Our results demonstrate that it is practical to train a reinforcement learning agent to perform real-world fine manipulation in about half a day, without hand engineered perception systems or calibrated instrumentation. Recordings of the real robot training can be found via this https URL.

[1]  Richard S. Sutton,et al.  Using Predictive Representations to Improve Generalization in Reinforcement Learning , 2005, IJCAI.

[2]  Philip S. Thomas,et al.  High Confidence Policy Improvement , 2015, ICML.

[3]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[4]  Shuran Song,et al.  Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[6]  Alice M. Agogino,et al.  Deep Reinforcement Learning for Robotic Assembly of Mixed Deformable and Rigid Objects , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[8]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[10]  Richard L. Lewis,et al.  Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.

[11]  J. Assad,et al.  Dissociation of visual, motor and predictive signals in parietal cortex during visual guidance , 1999, Nature Neuroscience.

[12]  Sergey Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[13]  Jun Jin,et al.  Learning predictive representations in autonomous driving to improve deep reinforcement learning , 2020, ArXiv.

[14]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[15]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[16]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[19]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[20]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[21]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[22]  Dustin Stokes,et al.  Perception and Its Modalities , 2014 .

[23]  Edwin P. D. Pednault Representation is everything , 2000, CACM.

[24]  Sergey Levine,et al.  Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[25]  Ken Chen,et al.  The learning-based optimization algorithm for robotic dual peg-in-hole assembly , 2018, Assembly Automation.

[26]  Jae-Bok Song,et al.  Automated guidance of peg-in-hole assembly tasks for complex-shaped parts , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[28]  Daniel Graves,et al.  Perception as prediction using general value functions in autonomous driving applications , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Giovanni De Magistris,et al.  Deep reinforcement learning for high precision assembly tasks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[31]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[32]  Matthieu Cord,et al.  Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection , 2018, NeurIPS.

[33]  Andy Clark,et al.  Perceiving as Predicting , 2014 .

[34]  Tom Schaul,et al.  Better Generalization with Forecasts , 2013, IJCAI.

[35]  Cewu Lu,et al.  Transferable Force-Torque Dynamics Model for Peg-in-hole Task , 2019, ArXiv.

[36]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[37]  James Bergstra,et al.  Setting up a Reinforcement Learning Task with a Real-World Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[39]  De Xu,et al.  Deep Reinforcement Learning of Robotic Precision Insertion Skill Accelerated by Demonstrations , 2019, 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE).

[40]  David Paulius,et al.  A Survey of Knowledge Representation in Service Robotics , 2018, Robotics Auton. Syst..

[41]  Yu Zhao,et al.  Teach industrial robots peg-hole-insertion by human demonstration , 2016, 2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM).

[42]  Daniel Leidner,et al.  Object-centered hybrid reasoning for whole-body mobile manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[44]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45]  James Bergstra,et al.  Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[46]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[47]  Matthew E. Taylor,et al.  Abstraction and Generalization in Reinforcement Learning: A Summary and Framework , 2009, ALA.

[48]  Pieter Abbeel,et al.  Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[50]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[51]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[52]  Oleg O. Sushkov,et al.  A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[53]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[54]  Jing Xu,et al.  Compare Contact Model-based Control and Contact Model-free Learning: A Survey of Robotic Peg-in-hole Assembly Strategies , 2019, ArXiv.

[55]  François Chaumette,et al.  Visual servo control. II. Advanced approaches [Tutorial] , 2007, IEEE Robotics & Automation Magazine.