暂无分享,去创建一个
[1] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[2] Kamyar Azizzadenesheli,et al. Reinforcement Learning in Rich-Observation MDPs using Spectral Methods , 2016, 1611.03907.
[3] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[4] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[5] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[6] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[7] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[8] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[9] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[10] Lilian Besson,et al. What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.
[11] Martin J. Wainwright,et al. Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.
[12] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[14] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[15] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[16] Ambuj Tewari,et al. Contextual Markov Decision Processes using Generalized Linear Models , 2019, ArXiv.
[17] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[18] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[19] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[20] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[21] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[22] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[23] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[24] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[25] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[26] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[27] Lin F. Yang,et al. Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.
[28] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[29] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[30] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[31] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[32] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[33] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[34] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[35] Ambuj Tewari,et al. No-regret Exploration in Contextual Reinforcement Learning , 2019, UAI.
[36] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[37] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[38] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[39] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[40] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning with Linear Transition Models , 2019, ICML 2019.
[41] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.
[42] Aditya Gopalan,et al. Online Learning in Kernelized Markov Decision Processes , 2019, AISTATS.
[43] Dean Alderucci. A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .
[44] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[45] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[46] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[47] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[48] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.