暂无分享,去创建一个
[1] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[2] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[3] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[4] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[5] Yongshuai Liu,et al. IPO: Interior-point Policy Optimization under Constraints , 2019, AAAI.
[6] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[7] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[8] Christos Thrampoulidis,et al. Linear Stochastic Bandits Under Safety Constraints , 2019, NeurIPS.
[9] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.
[10] Nathan Fulton,et al. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.
[11] Felix Berkenkamp,et al. Safe Exploration for Interactive Machine Learning , 2019, NeurIPS.
[12] Luiz F. O. Chamon,et al. Safe Policies for Reinforcement Learning via Primal-Dual Methods , 2019, IEEE Transactions on Automatic Control.
[13] Dorsa Sadigh,et al. Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models , 2019, 2019 American Control Conference (ACC).
[14] Xiaohan Wei,et al. Provably Efficient Safe Exploration via Primal-Dual Policy Optimization , 2021, AISTATS.
[15] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[16] E. Altman. Constrained Markov Decision Processes , 1999 .
[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[18] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[19] Yanan Sui,et al. Safe Reinforcement Learning in Constrained Markov Decision Processes , 2020, ICML.
[20] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[21] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[22] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[23] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[24] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[25] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[26] Dario Amodei,et al. Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .
[27] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.
[28] Joelle Pineau,et al. Constrained Markov Decision Processes via Backward Value Functions , 2020, ICML.
[29] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.
[30] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[31] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[32] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.
[33] Yisong Yue,et al. Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.
[34] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[35] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[36] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[37] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[38] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.