A Lyapunov-based Approach to Safe Reinforcement Learning
暂无分享,去创建一个
Ofir Nachum | Yinlam Chow | Mohammad Ghavamzadeh | Edgar A. Duéñez-Guzmán | M. Ghavamzadeh | Ofir Nachum | Yinlam Chow
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Eitan Altman,et al. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program , 1998, Math. Methods Oper. Res..
[3] Konkoly Thege. Multi-criteria Reinforcement Learning , 1998 .
[4] E. Altman. Constrained Markov Decision Processes , 1999 .
[5] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[6] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[7] Guy Shani,et al. An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..
[8] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[9] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[10] Michael Schmitt,et al. On the Complexity of Learning Lexicographic Strategies , 2006, J. Mach. Learn. Res..
[11] P. Glynn,et al. Bounding Stationary Expectations of Markov Processes , 2008 .
[12] Craig Boutilier,et al. Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.
[13] Naoki Abe,et al. Optimizing debt collections using constrained reinforcement learning , 2010, KDD.
[14] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[15] Mihaela van der Schaar,et al. Fast Reinforcement Learning for Energy-Efficient Wireless Communication , 2010, IEEE Transactions on Signal Processing.
[16] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[17] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[18] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[19] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[20] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..
[21] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..
[22] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[23] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[24] Marco Pavone,et al. Chance-constrained dynamic programming with application to risk-aware robotic space exploration , 2015, Autonomous Robots.
[25] Brian M. Sadler,et al. Trading Safety Versus Performance: Rapid Deployment of Robotic Swarms with Robust Performance Constraints , 2015, ArXiv.
[26] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[27] Behçet Açikmese,et al. Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints , 2016, 2016 American Control Conference (ACC).
[28] Shimon Whiteson,et al. Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.
[29] Sebastian Junges,et al. Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.
[30] Phillipp Kaestner,et al. Linear And Nonlinear Programming , 2016 .
[31] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[32] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[33] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[34] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[35] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[36] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[37] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[38] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[39] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[40] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[41] Michael I. Jordan,et al. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.