Adaptive Approximate Policy Iteration
暂无分享,去创建一个
Nevena Lazic | Yasin Abbasi-Yadkori | Csaba Szepesvari | Botao Hao | Pooria Joulani | Csaba Szepesvari | Pooria Joulani | Yasin Abbasi-Yadkori | Botao Hao | N. Lazic
[1] András György,et al. A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds , 2017, ALT.
[2] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[3] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[4] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[5] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[6] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.
[7] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[8] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[9] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[10] Byron Boots,et al. Predictor-Corrector Policy Optimization , 2018, ICML.
[11] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[14] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[15] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.
[16] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[17] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[18] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[19] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[20] Haipeng Luo,et al. Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes , 2020, ICML.
[21] Alessandro Lazaric,et al. Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs , 2019, NeurIPS.
[22] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[23] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.
[24] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[25] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[26] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[27] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[28] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[29] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2020, AISTATS.
[30] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[31] Daniel Russo,et al. Worst-Case Regret Bounds for Exploration via Randomized Value Functions , 2019, NeurIPS.
[32] H. Brendan McMahan,et al. A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..
[33] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[34] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[35] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[36] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[37] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[38] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[39] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[40] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[41] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[42] Bruno Scherrer,et al. Momentum in Reinforcement Learning , 2020, AISTATS.
[43] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[44] Mehryar Mohri,et al. Accelerating Online Convex Optimization via Adaptive Prediction , 2016, AISTATS.
[45] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.