暂无分享,去创建一个
[1] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[2] Andreas Krause,et al. Information Directed Sampling and Bandits with Heteroscedastic Noise , 2018, COLT.
[3] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[4] Pieter Abbeel,et al. Model-Augmented Actor-Critic: Backpropagating through Paths , 2020, ICLR.
[5] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[6] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[7] Michal Valko,et al. Regret Bounds for Kernel-Based Reinforcement Learning , 2020, ArXiv.
[8] T. Blumensath,et al. Theory and Applications , 2011 .
[9] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[10] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[11] Dirk P. Kroese,et al. The cross-entropy method for estimation , 2013 .
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[14] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[15] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[16] D. Jacobson. New second-order and first-order algorithms for determining optimal control: A differential dynamic programming approach , 1968 .
[17] Akshay Krishnamurthy,et al. Information Theoretic Regret Bounds for Online Nonlinear Control , 2020, NeurIPS.
[18] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[19] Yasin Abbasi-Yadkori,et al. Thompson Sampling and Approximate Inference , 2019, NeurIPS.
[20] Marc Peter Deisenroth,et al. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.
[21] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[22] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[23] Dirk P. Kroese,et al. Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .
[24] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[25] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[26] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[27] Zi Wang,et al. Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.
[28] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[29] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[30] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[31] Jan Peters,et al. Model-based Lookahead Reinforcement Learning , 2019, ArXiv.
[32] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[33] A. Kiureghian,et al. Aleatory or epistemic? Does it matter? , 2009 .
[34] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[35] Michael Figurnov,et al. Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..
[36] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[39] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[40] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[41] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[42] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[43] Carl E. Rasmussen,et al. PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos , 2019, ICML.
[44] Andreas Krause,et al. Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.
[45] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[46] Jonathan P. How,et al. Robust variable horizon model predictive control for vehicle maneuvering , 2006 .
[47] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[48] Martial Hebert,et al. Improved Learning of Dynamics Models for Control , 2016, ISER.
[49] Jay H. Lee,et al. Model predictive control: past, present and future , 1999 .
[50] Il Memming Park,et al. BLACK BOX VARIATIONAL INFERENCE FOR STATE SPACE MODELS , 2015, 1511.07367.
[51] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[52] Sergey Levine,et al. Optimism-driven exploration for nonlinear systems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[53] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[54] Jimmy Ba,et al. Exploring Model-based Planning with Policy Networks , 2019, ICLR.
[55] Stefano Ermon,et al. Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.
[56] Andreas Krause,et al. Structured Variational Inference in Unstable Gaussian Process State Space Models , 2019, ArXiv.
[57] Alessandro Lazaric,et al. Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation , 2020, ICML.
[58] Lukas Hewing,et al. On Simulation and Trajectory Prediction with Gaussian Process Dynamics , 2020, L4DC.
[59] Aditya Gopalan,et al. Online Learning in Kernelized Markov Decision Processes , 2019, AISTATS.
[60] Volkan Cevher,et al. Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization , 2017, COLT.
[61] Duy Nguyen-Tuong,et al. Probabilistic Recurrent State-Space Models , 2018, ICML.
[62] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[63] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[64] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[65] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[66] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[67] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[68] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[69] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[70] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[71] Felix Berkenkamp,et al. Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics , 2019 .
[72] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[73] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[74] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[75] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[76] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[77] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[78] Andreas Krause,et al. No-regret Bayesian Optimization with Unknown Hyperparameters , 2019, J. Mach. Learn. Res..
[79] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[80] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[81] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .
[82] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[83] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[84] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[85] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .
[86] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[87] Sandra Hirche,et al. Uniform Error Bounds for Gaussian Process Regression with Application to Safe Control , 2019, NeurIPS.
[88] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[89] Stefano Ermon,et al. Calibrated Model-Based Deep Reinforcement Learning , 2019, ICML.
[90] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[91] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[92] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[93] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[94] Dino Sejdinovic,et al. Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.
[95] James M. Rehg,et al. Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[96] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[97] Adam D. Bull,et al. Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..
[98] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[99] Benjamin Recht,et al. Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.