暂无分享,去创建一个
Lawson L. S. Wong | Michael L. Littman | Dilip Arumugam | David Abel | D. Ellis Hershkowitz | Cameron Allen | Lawson L.S. Wong | M. Littman | David Abel | Dilip Arumugam | Cameron S. Allen | D. E. Hershkowitz
[1] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.
[2] Alessandro Lazaric,et al. A Novel Confidence-Based Algorithm for Structured Bandits , 2020, AISTATS.
[3] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[4] Michael L. Littman,et al. Planning in Reward-Rich Domains via PAC Bandits , 2012, EWRL.
[5] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[6] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[7] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[8] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[9] Chi Jin,et al. Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms , 2021, NeurIPS.
[10] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[11] John N. Tsitsiklis,et al. A Structured Multiarmed Bandit Problem and the Greedy Policy , 2008, IEEE Transactions on Automatic Control.
[12] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[13] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[14] Mark Jerrum,et al. Approximating the Permanent , 1989, SIAM J. Comput..
[15] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[16] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[17] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[18] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[19] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[20] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[21] Peter Henderson,et al. An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning , 2021, ArXiv.
[22] Benjamin Van Roy,et al. Comments on the Du-Kakade-Wang-Yang Lower Bounds , 2019, ArXiv.
[23] Ruosong Wang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[24] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[25] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[26] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[27] Bimal Kumar Roy,et al. Counting, sampling and integrating: Algorithms and complexity , 2013 .
[28] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[29] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[30] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.