论文信息 - Offline Learning of Counterfactual Perception as Prediction for Real-World Robotic Reinforcement Learning

Offline Learning of Counterfactual Perception as Prediction for Real-World Robotic Reinforcement Learning

We propose a method for offline learning of counterfactual predictions to address real world robotic reinforcement learning challenges. The proposed method encodes action-oriented visual observations as several "what if" questions learned offline from prior experience using reinforcement learning methods. These "what if" questions counterfactually predict how action-conditioned observation would evolve on multiple temporal scales if the agent were to stick to its current action. We show that combining these offline counterfactual predictions along with online in-situ observations (e.g. force feedback) allows efficient policy learning with only a sparse terminal (success/failure) reward. We argue that the learned predictions form an effective representation of the visual task, and guide the online exploration towards high-potential success interactions (e.g. contact-rich regions). Experiments were conducted in both simulation and real-world scenarios for evaluation. Our results demonstrate that it is practical to train a reinforcement learning agent to perform real-world fine manipulation in about half a day, without hand engineered perception systems or calibrated instrumentation. Recordings of the real robot training can be found via this https URL.

[1] Richard S. Sutton,et al. Using Predictive Representations to Improve Generalization in Reinforcement Learning , 2005, IJCAI.

[2] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.

[3] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[4] Shuran Song,et al. Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[6] Alice M. Agogino,et al. Deep Reinforcement Learning for Robotic Assembly of Mixed Deformable and Rigid Objects , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Sergey Levine,et al. The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[8] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9] Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[10] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.

[11] J. Assad,et al. Dissociation of visual, motor and predictive signals in parietal cortex during visual guidance , 1999, Nature Neuroscience.

[12] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[13] Jun Jin,et al. Learning predictive representations in autonomous driving to improve deep reinforcement learning , 2020, ArXiv.

[14] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[15] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[16] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Rajesh P. N. Rao,et al. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[19] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.

[20] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[21] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[22] Dustin Stokes,et al. Perception and Its Modalities , 2014 .

[23] Edwin P. D. Pednault. Representation is everything , 2000, CACM.

[24] Sergey Levine,et al. Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[25] Ken Chen,et al. The learning-based optimization algorithm for robotic dual peg-in-hole assembly , 2018, Assembly Automation.

[26] Jae-Bok Song,et al. Automated guidance of peg-in-hole assembly tasks for complex-shaped parts , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[28] Daniel Graves,et al. Perception as prediction using general value functions in autonomous driving applications , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29] Giovanni De Magistris,et al. Deep reinforcement learning for high precision assembly tasks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30] Tai Sing Lee,et al. Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[31] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[32] Matthieu Cord,et al. Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection , 2018, NeurIPS.

[33] Andy Clark,et al. Perceiving as Predicting , 2014 .

[34] Tom Schaul,et al. Better Generalization with Forecasts , 2013, IJCAI.

[35] Cewu Lu,et al. Transferable Force-Torque Dynamics Model for Peg-in-hole Task , 2019, ArXiv.

[36] Xiaoou Tang,et al. Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[37] James Bergstra,et al. Setting up a Reinforcement Learning Task with a Real-World Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[39] De Xu,et al. Deep Reinforcement Learning of Robotic Precision Insertion Skill Accelerated by Demonstrations , 2019, 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE).

[40] David Paulius,et al. A Survey of Knowledge Representation in Service Robotics , 2018, Robotics Auton. Syst..

[41] Yu Zhao,et al. Teach industrial robots peg-hole-insertion by human demonstration , 2016, 2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM).

[42] Daniel Leidner,et al. Object-centered hybrid reasoning for whole-body mobile manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[43] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.

[44] Ian Taylor,et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45] James Bergstra,et al. Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[46] A. Clark. Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[47] Matthew E. Taylor,et al. Abstraction and Generalization in Reinforcement Learning: A Summary and Framework , 2009, ALA.

[48] Pieter Abbeel,et al. Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[50] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[51] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[52] Oleg O. Sushkov,et al. A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[53] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[54] Jing Xu,et al. Compare Contact Model-based Control and Contact Model-free Learning: A Survey of Robotic Peg-in-hole Assembly Strategies , 2019, ArXiv.

[55] François Chaumette,et al. Visual servo control. II. Advanced approaches [Tutorial] , 2007, IEEE Robotics & Automation Magazine.