暂无分享,去创建一个
Junhyuk Oh | Nir Levine | Zhongwen Xu | Daniel J. Mankowitz | Timothy Mann | Tom Zahavy | Dan A. Calian
[1] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[4] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[5] Bruno Castro da Silva,et al. On Ensuring that Intelligent Machines Are Well-Behaved , 2017, ArXiv.
[6] Raia Hadsell,et al. Value constrained model-free continuous control , 2019, ArXiv.
[7] Paolo Frasconi,et al. Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.
[8] Kalyanmoy Deb,et al. A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications , 2017, IEEE Transactions on Evolutionary Computation.
[9] Nir Levine,et al. An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.
[10] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[11] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[12] Alejandro Ribeiro,et al. Constrained Reinforcement Learning Has Zero Duality Gap , 2019, NeurIPS.
[13] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[14] Matthew E. Taylor,et al. Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, IJCAI.
[15] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[16] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[17] Dario Amodei,et al. Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .
[18] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[19] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[20] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[21] Hongxia Jin,et al. Reward Constrained Interactive Recommendation with Natural Language Feedback , 2020, ArXiv.
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] E. Altman. Constrained Markov Decision Processes , 1999 .
[24] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[25] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[26] Junhyuk Oh,et al. Self-Tuning Deep Reinforcement Learning , 2020, ArXiv.
[27] Shie Mannor,et al. Exploration-Exploitation in Constrained MDPs , 2020, ArXiv.
[28] Joelle Pineau,et al. Constrained Markov Decision Processes via Backward Value Functions , 2020, ICML.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.