暂无分享,去创建一个
Marcello Restelli | Andrea Tirinzoni | Alberto Maria Metelli | Matteo Papini | Pierluca D'Oro | Marcello Restelli | Andrea Tirinzoni | M. Papini | P. D'Oro
[1] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[2] Yoshua Bengio,et al. Using a Financial Training Criterion Rather than a Prediction Criterion , 1997, Int. J. Neural Syst..
[3] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[4] Jan Peters,et al. Model learning for robot control: a survey , 2011, Cognitive Processing.
[5] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Alborz Geramifard,et al. Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.
[8] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[9] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.
[12] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[13] H. Kahn,et al. Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..
[14] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[15] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[16] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[17] L. Piroddi,et al. An identification algorithm for polynomial NARX models based on simulation error minimization , 2003 .
[18] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[19] MODEL-ENSEMBLE TRUST-REGION POLICY OPTI- , 2017 .
[20] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[21] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[22] Jan Peters,et al. Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.
[23] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[24] Marcello Restelli,et al. Transfer of Samples in Policy Search via Multiple Importance Sampling , 2019, ICML.
[25] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[26] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[27] Mehryar Mohri,et al. Relative deviation learning bounds and generalization with unbounded loss functions , 2013, Annals of Mathematics and Artificial Intelligence.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[30] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[31] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[32] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[33] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.
[34] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[35] Xin Wang,et al. Model-based Policy Gradient Reinforcement Learning , 2003, ICML.
[36] Sergey Levine,et al. Goal-driven dynamics learning via Bayesian optimization , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] A. R. Penner,et al. The physics of putting , 2002 .
[39] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[40] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[41] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.
[42] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[43] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[44] Jun Morimoto,et al. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation , 2013, Neural Networks.
[45] Byron Boots,et al. Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.
[46] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[47] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[48] Masashi Sugiyama,et al. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation , 2013 .
[49] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[50] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[51] Marcello Restelli,et al. Configurable Markov Decision Processes , 2018, ICML.
[52] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[53] Tsuyoshi Murata,et al. {m , 1934, ACML.
[54] Priya L. Donti,et al. Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.
[55] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[56] Andrea Bonarini,et al. Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.
[57] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[58] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[59] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[61] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[62] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.