Policy Optimization by Local Improvement through Search
暂无分享,去创建一个
Navdeep Jaitly | Azalia Mirhoseini | Ebrahim M. Songhori | Amir Yazdanbakhsh | Jialin Song | Joe Wenjie Jiang | Anna Goldie | Azalia Mirhoseini | Anna Goldie | J. Jiang | A. Yazdanbakhsh | Jialin Song | N. Jaitly
[1] Mayank Bansal,et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.
[2] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.
[3] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[4] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[5] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[6] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[7] Sergey Levine,et al. Causal Confusion in Imitation Learning , 2019, NeurIPS.
[8] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[9] Trevor Darrell,et al. Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[10] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[11] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.
[12] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[13] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[14] Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .
[15] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[16] Michael S. Ryoo,et al. Learning Real-World Robot Policies by Dreaming , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[17] Jan Willemson,et al. Improved Monte-Carlo Search , 2006 .
[18] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[19] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[20] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[22] TRUNCATED HORIZON POLICY SEARCH: COMBINING REINFORCEMENT LEARNING & IMITATION LEARNING , 2018 .
[23] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.
[24] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[25] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[26] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[27] Kenneth Y. Goldberg,et al. Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.
[28] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[29] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[30] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[31] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[32] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[33] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[34] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[35] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.
[36] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.
[37] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[38] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[39] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[40] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.
[41] Byron Boots,et al. Predictor-Corrector Policy Optimization , 2018, ICML.
[42] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[43] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.
[44] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[45] Byron Boots,et al. Dual Policy Iteration , 2018, NeurIPS.
[46] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[47] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.
[48] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[49] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).