暂无分享,去创建一个
Yuandong Tian | Trevor Darrell | Yuanzhi Li | Tengyu Ma | Huazhe Xu | Trevor Darrell | Yuanzhi Li | Tengyu Ma | Yuandong Tian | Huazhe Xu
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[3] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[4] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[5] Peter J. Gawthrop,et al. Neural networks for control systems - A survey , 1992, Autom..
[6] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[7] Jun Morimoto,et al. Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.
[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[9] Karl Hinderer,et al. Lipschitz Continuity of Value Functions in Markovian Decision Processes , 2005, Math. Methods Oper. Res..
[10] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.
[11] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[12] Dieter Fox,et al. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[13] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[14] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[15] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[16] Aude Billard,et al. Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.
[17] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.
[18] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[19] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[20] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[21] Dotan Di Castro,et al. Integrating a Partial Model into Model Free Reinforcement Learning , 2012, J. Mach. Learn. Res..
[22] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[23] Masashi Sugiyama,et al. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation , 2013 .
[24] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[25] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[26] Jan Peters,et al. Sample-based informationl-theoretic stochastic optimal control , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[27] Michael C. Yip,et al. Model-Less Feedback Control of Continuum Manipulators in Constrained Environments , 2014, IEEE Transactions on Robotics.
[28] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[29] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[30] Jun Morimoto,et al. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation , 2013, Neural Networks.
[31] Frank Nielsen,et al. On the chi square and higher-order chi distances for approximating f-divergences , 2013, IEEE Signal Processing Letters.
[32] Luca Bascetta,et al. Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.
[33] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[34] Sergey Levine,et al. Optimism-driven exploration for nonlinear systems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[35] Ronald Ortner,et al. Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2015, ICML.
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[38] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[39] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[40] Sergey Levine,et al. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[41] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[42] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[43] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[44] Pieter Abbeel,et al. Combining model-based policy search with online model learning for control of physical humanoids , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[45] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[46] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[47] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[48] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[49] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[50] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[51] Razvan Pascanu,et al. Learning model-based planning from scratch , 2017, ArXiv.
[52] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[53] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[54] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[55] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[56] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[57] Byron Boots,et al. Dual Policy Iteration , 2018, NeurIPS.
[58] Raia Hadsell,et al. Graph networks as learnable physics engines for inference and control , 2018, ICML.
[59] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[60] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[61] Nikolai Matni,et al. Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[62] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[63] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[64] Michael I. Jordan,et al. Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.
[65] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[66] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[67] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.
[68] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[69] Allan Jabri,et al. Universal Planning Networks , 2018, ICML.
[70] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.
[71] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[72] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[73] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[74] Joelle Pineau,et al. The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach , 2018, J. Artif. Intell. Res..
[75] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.