Scaling data-driven robotics with reward sketching and batch reinforcement learning
暂无分享,去创建一个
Oleg O. Sushkov | Misha Denil | Sergio Gomez Colmenarejo | Nando de Freitas | Alexander Novikov | Ziyu Wang | David Budden | Ksenia Konyushkova | Scott Reed | Jonathan Scholz | David Barker | Yusuf Aytar | Oleg Sushkov | Konrad Zolna | Scott E. Reed | Serkan Cabi | Sergio G'omez Colmenarejo | Rae Jeong | Mel Vecerik | D. Budden | Ziyun Wang | N. D. Freitas | Misha Denil | Serkan Cabi | Jonathan Scholz | Mel Vecerík | Y. Aytar | Alexander Novikov | Ksenia Konyushkova | Rae Jeong | Konrad Zolna | David Barker
[1] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[2] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[3] Stefano Ermon,et al. InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.
[4] S. Fienberg,et al. Log linear representation for paired and multiple comparisons models , 1976 .
[5] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.
[6] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Andrew J. Davison,et al. Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.
[9] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[10] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .
[11] Sergey Levine,et al. Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.
[12] Nando de Freitas,et al. A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.
[13] Wes McKinney,et al. Data Structures for Statistical Computing in Python , 2010, SciPy.
[14] Filip Radlinski,et al. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.
[15] Andrew W. Moore,et al. Policy Search using Paired Comparisons , 2003, J. Mach. Learn. Res..
[16] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[17] F. Mosteller. Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations , 1951 .
[18] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[19] Wei Chu,et al. Preference learning with Gaussian processes , 2005, ICML.
[20] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[21] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[22] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[23] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[24] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[25] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[26] Stefan Schaal,et al. Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[27] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[28] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[29] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[30] Andrew J. Davison,et al. Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.
[31] Travis E. Oliphant,et al. Guide to NumPy , 2015 .
[32] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[33] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.
[34] Sergey Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[37] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[38] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[39] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[40] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[43] Daniel Jurafsky,et al. Lexicon-Free Conversational Speech Recognition with Neural Networks , 2015, NAACL.
[44] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[45] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[46] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[47] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.
[48] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.
[49] Jackie Kay,et al. Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[50] Rouhollah Rahmatizadeh,et al. Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[51] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..
[52] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[54] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[55] Silvio Savarese,et al. ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.
[56] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[57] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[58] Abhinav Gupta,et al. Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation , 2018, CoRL.
[59] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[60] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[61] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[62] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[63] Martin A. Riedmiller,et al. Regularized Hierarchical Policies for Compositional Transfer in Robotics , 2019, ArXiv.
[64] Michèle Sebag,et al. Programming by Feedback , 2014, ICML.
[65] Nando de Freitas,et al. Active Preference Learning with Discrete Choice Data , 2007, NIPS.
[66] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[67] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[68] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[69] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[70] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[71] Yuval Tassa,et al. Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.
[72] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[73] Oleg O. Sushkov,et al. A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[74] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[75] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[76] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[77] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[78] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.
[79] Shie Mannor,et al. End-to-End Differentiable Adversarial Imitation Learning , 2017, ICML.
[80] Takeo Igarashi,et al. Sequential line search for efficient visual design optimization by crowds , 2017, ACM Trans. Graph..