暂无分享,去创建一个
Pieter Abbeel | Guodong Zhang | Jimmy Ba | Tingwu Wang | Yeming Wen | Eric Langlois | Jerrick Hoang | Xuchan Bao | Ignasi Clavera | Shunshi Zhang | Matthew Shunshi Zhang | P. Abbeel | Jimmy Ba | I. Clavera | Yeming Wen | Guodong Zhang | Xuchan Bao | Eric D. Langlois | Tingwu Wang | Jerrick Hoang
[1] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[4] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[7] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[9] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[10] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[11] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[12] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[13] Arthur G. Richards,et al. Robust constrained model predictive control , 2005 .
[14] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[15] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[16] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[18] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[19] Marc Peter Deisenroth,et al. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.
[20] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[21] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[22] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[23] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[24] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[25] Chris Callison-Burch,et al. Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .
[26] Anil V. Rao,et al. ( Preprint ) AAS 09-334 A SURVEY OF NUMERICAL METHODS FOR OPTIMAL CONTROL , 2009 .
[27] Dirk P. Kroese,et al. Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .
[28] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[29] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[30] Alberto Bemporad,et al. The explicit linear quadratic regulator for constrained systems , 2003, Autom..
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[33] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[34] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[35] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[36] Dirk P. Kroese,et al. The cross-entropy method for estimation , 2013 .
[37] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[38] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.
[39] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[40] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[41] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[42] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[43] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[44] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[45] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[46] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[47] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[48] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[49] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[50] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[51] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[52] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.