暂无分享,去创建一个
Anders Jonsson | Michal Valko | Emilie Kaufmann | Pierre M'enard | Omar Darwiche Domingues | Edouard Leurent
[1] E. Kaufmann,et al. Planning in Markov Decision Processes with Gap-Dependent Sample Complexity , 2020, NeurIPS.
[2] Xuezhou Zhang,et al. Task-agnostic Exploration in Reinforcement Learning , 2020, NeurIPS.
[3] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[4] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[5] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[6] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[7] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[8] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[9] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[10] T. Lai,et al. SELF-NORMALIZED PROCESSES: EXPONENTIAL INEQUALITIES, MOMENT BOUNDS AND ITERATED LOGARITHM LAWS , 2004, math/0410102.
[11] Ronald Ortner,et al. Autonomous exploration for navigating in non-stationary CMPs , 2019, ArXiv.
[12] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[13] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[14] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[15] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[16] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[17] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[18] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[19] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[20] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[21] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[22] Keyan Zahedi,et al. Information Theoretically Aided Reinforcement Learning for Embodied Agents , 2016, ArXiv.
[23] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[24] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[25] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[26] Sarah Filippi,et al. Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[27] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[28] Ruosong Wang,et al. On Reward-Free Reinforcement Learning with Linear Function Approximation , 2020, NeurIPS.
[29] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[30] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2020, ICML.
[31] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[32] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[33] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[34] Peter Auer,et al. Autonomous Exploration For Navigating In MDPs , 2012, COLT.