Mismatched No More: Joint Model-Policy Optimization for Model-Based RL
暂无分享,去创建一个
[1] Rishabh Agarwal,et al. Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation , 2021, AAAI.
[2] Tim G. J. Rudner,et al. Outcome-Driven Reinforcement Learning via Variational Inference , 2021, NeurIPS.
[3] Sergey Levine,et al. COMBO: Conservative Offline Model-Based Policy Optimization , 2021, NeurIPS.
[4] Sergey Levine,et al. C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.
[5] Satinder Singh,et al. The Value Equivalence Principle for Model-Based Reinforcement Learning , 2020, NeurIPS.
[6] Xiaofang Zhang,et al. GAN-Based Planning Model in Deep Reinforcement Learning , 2020, ICANN.
[7] S. Levine,et al. Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers , 2020, ICLR.
[8] Yinlam Chow,et al. Variational Model-based Policy Optimization , 2020, IJCAI.
[9] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[10] T. Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[11] Vikash Kumar,et al. A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.
[12] J. Andrew Bagnell,et al. Planning and Execution using Inaccurate Models with Provable Guarantees , 2020, Robotics: Science and Systems.
[13] Roberto Calandra,et al. Objective Mismatch in Model-based Reinforcement Learning , 2020, L4DC.
[14] Jimmy Ba,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[15] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[16] Hongning Wang,et al. Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation , 2019, ArXiv.
[17] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[18] S. Levine,et al. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots , 2019, CoRL.
[19] Marcello Restelli,et al. Gradient-Aware Model-based Policy Search , 2019, AAAI.
[20] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[21] Kavosh Asadi,et al. Combating the Compounding-Error Problem with a Multi-step Model , 2019, ArXiv.
[22] Byron Boots,et al. Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.
[23] Pieter Abbeel,et al. Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.
[24] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[25] Allan Jabri,et al. Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control , 2018, ICML.
[26] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[27] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[28] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[29] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[30] Luca Rigazio,et al. Path Integral Networks: End-to-End Differentiable Optimal Control , 2017, ArXiv.
[31] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[32] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[33] Martial Hebert,et al. Improved Learning of Dynamics Models for Control , 2016, ISER.
[34] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[35] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.
[36] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[37] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[38] Aaron C. Courville,et al. Generative Adversarial Nets , 2014, NIPS.
[39] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[40] Alborz Geramifard,et al. Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.
[41] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[42] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[43] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[44] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.
[45] Laurent El Ghaoui,et al. Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.
[46] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[47] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[48] Andrew Y. Ng,et al. Solving Uncertain Markov Decision Processes , 2001 .