暂无分享,去创建一个
Quanquan Gu | Amy Zhang | Jiafan He | Dongruo Zhou | Weitong Zhang | Quanquan Gu | Amy Zhang | Jiafan He | Dongruo Zhou | Weitong Zhang
[1] Yasin Abbasi-Yadkori,et al. Regret Balancing for Bandit and RL Model Selection , 2020, ArXiv.
[2] Akshay Krishnamurthy,et al. FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs , 2020, NeurIPS.
[3] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[4] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[5] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[6] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[7] Akshay Krishnamurthy,et al. Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning , 2019, ICML.
[8] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[9] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[10] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[11] Haipeng Luo,et al. Model selection for contextual bandits , 2019, NeurIPS.
[12] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[13] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[14] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[15] Peter L. Bartlett,et al. OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits , 2019, AISTATS.
[16] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[17] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[18] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[19] Alessandro Lazaric,et al. Leveraging Good Representations in Linear Contextual Bandits , 2021, ICML.
[20] David Simchi-Levi,et al. Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.
[21] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[22] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[23] Bo Dai,et al. Offline Policy Selection under Uncertainty , 2020, AISTATS.
[24] Claudio Gentile,et al. Regret Bound Balancing and Elimination for Model Selection in Bandits and RL , 2020, ArXiv.
[25] Julian Zimmert,et al. Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.
[26] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[27] Haipeng Luo,et al. Open Problem: Model Selection for Contextual Bandits , 2020, COLT.
[28] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[29] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[30] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[31] Haipeng Luo,et al. Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation , 2020, AISTATS.
[32] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.
[33] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[34] Kannan Ramchandran,et al. Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits , 2021, AISTATS.
[35] Akshay Krishnamurthy,et al. Model-free Representation Learning and Exploration in Low-rank MDPs , 2021, ArXiv.
[36] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[37] Lin F. Yang,et al. Q-learning with Logarithmic Regret , 2020, AISTATS.
[38] Karl Tuyls,et al. Integrating State Representation Learning Into Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.
[39] Rémi Munos,et al. Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.
[40] Pieter Abbeel,et al. Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.
[41] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[42] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[43] Balaraman Ravindran,et al. Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.
[44] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.