Bilinear Classes: A Structural Framework for Provable Generalization in RL
暂无分享,去创建一个
Shachar Lovett | Sham M. Kakade | Simon S. Du | Wen Sun | Jason D. Lee | Ruosong Wang | Gaurav Mahajan | S. Kakade | S. Du | Wen Sun | Shachar Lovett | Ruosong Wang | G. Mahajan
[1] Csaba Szepesv'ari,et al. Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions , 2020, ALT.
[2] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.
[3] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[4] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[5] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[6] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[7] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[8] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[9] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[10] Akshay Krishnamurthy,et al. Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning , 2019, ICML.
[11] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.
[12] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[13] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[14] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.
[15] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[16] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[17] Wen Sun,et al. PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning , 2020, NeurIPS.
[18] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[19] Ruosong Wang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[20] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[21] Chi Jin,et al. Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms , 2021, NeurIPS.
[22] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[23] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[24] Sham M. Kakade,et al. A Short Note on the Relationship of Information Gain and Eluder Dimension , 2021, ArXiv.
[25] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[26] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[27] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[28] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[29] Akshay Krishnamurthy,et al. FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs , 2020, NeurIPS.
[30] Michael I. Jordan,et al. Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations , 2020, ArXiv.
[31] Alexander Rakhlin,et al. Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.
[32] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.
[33] Akshay Krishnamurthy,et al. Information Theoretic Regret Bounds for Online Nonlinear Control , 2020, NeurIPS.
[34] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[35] Zhengyuan Zhou,et al. Provably Efficient Reinforcement Learning with Aggregated States , 2019, ArXiv.
[36] Tengyu Ma,et al. On the Expressivity of Neural Networks for Deep Reinforcement Learning , 2019, ICML.
[37] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[38] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[39] Jian Peng,et al. √n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank , 2019, COLT.
[40] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[41] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[42] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity , 2020, ArXiv.
[43] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[44] Alexandre M. Bayen,et al. Framework for control and deep reinforcement learning in traffic , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).