暂无分享,去创建一个
[1] Soon-Jo Chung,et al. Robust Regression for Safe Exploration in Control , 2019, L4DC.
[2] Vikash Kumar,et al. A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.
[3] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[4] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[5] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[6] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.
[7] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[8] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[9] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[10] Mohammad Ghavamzadeh,et al. Policy-Aware Model Learning for Policy Gradient Methods , 2020, ArXiv.
[11] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[12] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.
[13] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[14] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[15] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.
[16] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Yuhong Yang,et al. Information Theory, Inference, and Learning Algorithms , 2005 .
[19] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[20] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[21] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[22] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[23] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[24] Florian Schäfer,et al. Competitive Gradient Descent , 2019, NeurIPS.
[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[26] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[27] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[28] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[29] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.