暂无分享,去创建一个
Prasad Tadepalli | Alexander Matt Turner | Neale Ratzlaff | Neale Ratzlaff | P. Tadepalli | A. Turner
[1] E. Altman. Constrained Markov Decision Processes , 1999 .
[2] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[3] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[4] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[5] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[6] Kevin B. Korb,et al. The Frame Problem: An AI Fairy Tale , 1998, Minds and Machines.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[9] Joel Lehman,et al. Towards Empathic Deep Q-Learning , 2019, AISafety@IJCAI.
[10] Peter Eckersley,et al. SafeLife 1.0: Exploring Side Effects in Complex Environments , 2019, SafeAI@AAAI.
[11] John P. Cunningham,et al. The continuous Bernoulli: fixing a pervasive error in variational autoencoders , 2019, NeurIPS.
[12] Dylan Hadfield-Menell,et al. Conservative Agency via Attainable Utility Preservation , 2019, AIES.
[13] F. Brown. The frame problem in artificial intelligence , 1987 .
[14] Master Gardener,et al. Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .
[15] Tomás Svoboda,et al. Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.
[16] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[17] Laurent Orseau,et al. Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.
[18] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[19] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[20] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[21] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[22] Paul W. Rendell,et al. Turing Universality of the Game of Life , 2002, Collision-Based Computing.
[23] Edmund H. Durfee,et al. Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes , 2018, IJCAI.
[24] Craig Boutilier,et al. Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies , 2010, AAAI.
[25] Sergey Levine,et al. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.
[26] Frank Markham Brown. The Frame Problem in Artificial Intelligence: Proceedings of the 1987 Workshop April 12-15, 1987 Lawrence, Kansas , 1987 .
[27] Alexander Matt Turner. Optimal Farsighted Agents Tend to Seek Power , 2019, ArXiv.
[28] Laurent Orseau,et al. Measuring and avoiding side effects using relative reachability , 2018, ArXiv.