How to train your robot with deep reinforcement learning: lessons we have learned
暂无分享,去创建一个
Sergey Levine | Chelsea Finn | Mrinal Kalakrishnan | Jie Tan | Julian Ibarz | Peter Pastor | S. Levine | P. Pastor | Chelsea Finn | Jie Tan | Julian Ibarz | Mrinal Kalakrishnan
[1] Ashish Agarwal,et al. Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.
[2] Aleksandra Faust,et al. Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.
[3] Murilo F. Martins,et al. Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup , 2019, Robotics: Science and Systems.
[4] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[5] Sergey Levine,et al. Reset-free guided policy search: Efficient deep reinforcement learning with stochastic initial states , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[6] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[7] Brijen Thananjeyan,et al. Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.
[8] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[10] Andrew J. Davison,et al. Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.
[11] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.
[12] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.
[13] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[14] Avik De,et al. Modular Hopping and Running via Parallel Composition , 2017 .
[15] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[16] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[17] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[18] Russ Tedrake,et al. Self-Supervised Correspondence in Visuomotor Policy Learning , 2019, IEEE Robotics and Automation Letters.
[19] Masashi Sugiyama,et al. Imitation Learning from Imperfect Demonstration , 2019, ICML.
[20] Marc H. Raibert,et al. Legged Robots That Balance , 1986, IEEE Expert.
[21] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[22] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[23] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[24] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] C. Karen Liu,et al. Learning bicycle stunts , 2014, ACM Trans. Graph..
[26] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[27] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.
[28] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..
[29] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[30] Raia Hadsell,et al. Value constrained model-free continuous control , 2019, ArXiv.
[31] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[32] Peter Corke,et al. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.
[33] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[34] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[35] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..
[36] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[37] Sergey Levine,et al. Learning Image-Conditioned Dynamics Models for Control of Underactuated Legged Millirobots , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[38] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[39] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[40] Sehoon Ha,et al. Learning to be Safe: Deep RL with a Safety Critic , 2020, ArXiv.
[41] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[42] Scott Kuindersma,et al. Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..
[43] Aleksandra Faust,et al. Long-Range Indoor Navigation With PRM-RL , 2020, IEEE Transactions on Robotics.
[44] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[45] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[46] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.
[47] Sergey Levine,et al. Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.
[48] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[49] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[50] Sergey Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.
[51] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[52] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[53] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.
[54] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[55] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[56] Oleg O. Sushkov,et al. A Framework for Data-Driven Robotics , 2019, ArXiv.
[57] Kagan Tumer,et al. Collaborative Evolutionary Reinforcement Learning , 2019, ICML.
[58] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.
[59] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[60] Sergey Levine,et al. Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[61] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[62] Sergey Levine,et al. Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Allan Jabri,et al. Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.
[64] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[65] Vladlen Koltun,et al. Learning by Cheating , 2019, CoRL.
[66] Kate Saenko,et al. Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..
[67] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[68] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[69] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[70] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[71] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.
[72] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[73] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.
[74] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Zhao Chen,et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.
[76] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[77] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[78] Sergey Levine,et al. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.
[79] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[80] Petra Kaufmann,et al. Learning to walk , 2007, Neurology.
[81] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[82] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[83] Sebastian Engell,et al. Model Predictive Control Using Neural Networks [25 Years Ago] , 1995, IEEE Control Systems.
[84] Sergey Levine,et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.
[85] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[86] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[87] Dorsa Sadigh,et al. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences , 2020, Int. J. Robotics Res..
[88] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[89] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[90] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[91] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[92] Leslie Pack Kaelbling,et al. Residual Policy Learning , 2018, ArXiv.
[93] Vladlen Koltun,et al. Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.
[94] Sergey Levine,et al. Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[95] Peter I. Corke,et al. Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[96] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[97] Andrew J. Davison,et al. Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.
[98] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[99] Wolfram Burgard,et al. The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..
[100] Oliver Kroemer,et al. Learning sequential motor tasks , 2013, 2013 IEEE International Conference on Robotics and Automation.
[101] Sergey Levine,et al. Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.
[102] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .
[103] Mohi Khansari,et al. RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[104] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[105] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[106] Atil Iscen,et al. NoRML: No-Reward Meta Learning , 2019, AAMAS.
[107] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[108] Jan Peters,et al. Learning to sequence movement primitives from demonstrations , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[109] E. Altman. Constrained Markov Decision Processes , 1999 .
[110] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[111] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[112] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[113] Russ Tedrake,et al. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.
[114] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[115] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.
[116] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.
[117] Kate Saenko,et al. Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.
[118] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.
[119] Atil Iscen,et al. Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.
[120] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[121] Sergey Levine,et al. Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[122] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.
[123] Franziska Meier,et al. SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[124] Gaurav S. Sukhatme,et al. Regrasping Using Tactile Perception and Supervised Policy Learning , 2017, AAAI Spring Symposia.
[125] Atil Iscen,et al. Policies Modulating Trajectory Generators , 2018, CoRL.
[126] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[127] Karol Hausman,et al. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control , 2020, ICLR.
[128] Kuan-Ting Yu,et al. Realtime State Estimation with Tactile and Visual Sensing. Application to Planar Manipulation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[129] Lorenz Wellhausen,et al. Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.
[130] C. Karen Liu,et al. Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..
[131] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[132] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.
[133] Alberto Rodriguez,et al. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[134] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[135] Sehoon Ha,et al. Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.
[136] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[137] Gaurav S. Sukhatme,et al. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.
[138] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[139] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[140] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[141] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[142] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[143] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[144] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[145] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.
[146] Sehoon Ha,et al. Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot , 2018, 2018 15th International Conference on Ubiquitous Robots (UR).
[147] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[148] Aravind Rajeswaran,et al. Learning Deep Visuomotor Policies for Dexterous Hand Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[149] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[150] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.
[151] C. Karen Liu,et al. Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..
[152] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[153] Joonho Lee,et al. Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.
[154] Sergey Levine,et al. Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[155] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[156] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[157] Oliver Kroemer,et al. A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..
[158] Xinyu Liu,et al. Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.
[159] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[160] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[161] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[162] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[163] Yen-Chen Lin,et al. Experience-Embedded Visual Foresight , 2019, CoRL.