暂无分享,去创建一个
[1] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[2] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.
[3] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[4] Lillian J. Ratliff,et al. Constrained Upper Confidence Reinforcement Learning , 2020, L4DC.
[5] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[6] Yisong Yue,et al. Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.
[7] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[8] Mohammad Ghavamzadeh,et al. Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.
[9] Tao Qin,et al. Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.
[10] Xiaohan Wei,et al. Online Convex Optimization with Stochastic Constraints , 2017, NIPS.
[11] Yifan Wu,et al. Conservative Bandits , 2016, ICML.
[12] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[15] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2020, ICML.
[16] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[17] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.
[18] E. Altman. Constrained Markov Decision Processes , 1999 .
[19] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[20] Benjamin Van Roy,et al. Conservative Contextual Linear Bandits , 2016, NIPS.
[21] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.
[22] Alessandro Lazaric,et al. Conservative Exploration in Reinforcement Learning , 2020, AISTATS.
[23] Alessandro Lazaric,et al. Improved Algorithms for Conservative Exploration in Bandits , 2020, AAAI.
[24] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[25] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[26] Jeffrey P. Kharoufeh,et al. Linear programming formulation for non-stationary, finite-horizon Markov decision process models , 2017, Oper. Res. Lett..
[27] Aaron Roth,et al. Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.
[28] Nikhil R. Devanur,et al. Bandits with Global Convex Constraints and Objective , 2019, Oper. Res..
[29] Torsten Koller,et al. Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.
[30] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[31] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[32] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[33] Rong Jin,et al. Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..
[34] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[35] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[36] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[37] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[38] John N. Tsitsiklis,et al. On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies , 2005, Math. Oper. Res..
[39] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[40] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[41] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[42] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[43] R. Srikant,et al. Bandits with Budgets , 2015, SIGMETRICS.
[44] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.
[45] D. Bertsekas,et al. Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.
[46] Alejandro Ribeiro,et al. Constrained Reinforcement Learning Has Zero Duality Gap , 2019, NeurIPS.
[47] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[48] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.