Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design
暂无分享,去创建一个
[1] Kevin G. Jamieson,et al. Instance-optimal PAC Algorithms for Contextual Bandits , 2022, NeurIPS.
[2] A. Krause,et al. Active Exploration via Experiment Design in Markov Chains , 2022, AISTATS.
[3] Aymen Al Marjani,et al. Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs , 2022, NeurIPS.
[4] Kevin G. Jamieson,et al. Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes , 2022, ICML.
[5] Dylan J. Foster,et al. The Statistical Complexity of Interactive Decision Making , 2021, ArXiv.
[6] Kevin G. Jamieson,et al. First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach , 2021, ICML.
[7] Prateek Jain,et al. Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs , 2021, ICLR.
[8] Kevin G. Jamieson,et al. Beyond No Regret: Instance-Dependent PAC Reinforcement Learning , 2021, COLT.
[9] Julian Zimmert,et al. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning , 2021, NeurIPS.
[10] Alexandre Proutiere,et al. Navigating to the Best Policy in Markov Decision Processes , 2021, NeurIPS.
[11] Satinder Singh,et al. Reward is enough for convex MDPs , 2021, NeurIPS.
[12] Sham M. Kakade,et al. An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap , 2021, NeurIPS.
[13] Shachar Lovett,et al. Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.
[14] Max Simchowitz,et al. Task-Optimal Exploration in Linear Dynamical Systems , 2021, ICML.
[15] Tengyu Ma,et al. Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap , 2021, COLT.
[16] Chi Jin,et al. Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms , 2021, NeurIPS.
[17] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.
[18] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[19] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[20] Csaba Szepesvari,et al. Online Sparse Reinforcement Learning , 2020, AISTATS.
[21] Vahab Mirrokni,et al. Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions , 2020, NeurIPS.
[22] Csaba Szepesv'ari,et al. Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions , 2020, ALT.
[23] S. Du,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2020, COLT.
[24] Alexandre Proutiere,et al. Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity , 2020, ArXiv.
[25] Mykel J. Kochenderfer,et al. Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration , 2020, NeurIPS.
[26] Anders Jonsson,et al. Fast active learning for pure exploration in reinforcement learning , 2020, ICML.
[27] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[28] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[29] E. Kaufmann,et al. Planning in Markov Decision Processes with Gap-Dependent Sample Complexity , 2020, NeurIPS.
[30] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[31] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[32] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[33] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2019, AISTATS.
[34] Lin F. Yang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2019, ICLR.
[35] Lalit Jain,et al. Sequential Experimental Design for Transductive Linear Bandits , 2019, NeurIPS.
[36] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[37] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[38] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[39] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[40] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[41] Alexandre Proutière,et al. Exploration in Structured Reinforcement Learning , 2018, NeurIPS.
[42] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[43] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[44] Alessandro Lazaric,et al. Best-Arm Identification in Linear Bandits , 2014, NIPS.
[45] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[46] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[47] Kunal Talwar,et al. Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).
[48] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.
[49] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[50] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[51] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[52] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[53] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[54] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .
[55] Xiangyang Ji,et al. Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP , 2021, ArXiv.
[56] Mykel J. Kochenderfer,et al. Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model , 2019, NeurIPS.
[57] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[58] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[59] Michael Jackson,et al. Optimal Design of Experiments , 1994 .