暂无分享,去创建一个
[1] Marc Abeille,et al. Improved Optimistic Algorithms for Logistic Bandits , 2020, ICML.
[2] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[3] Shie Mannor,et al. Confidence-Budget Matching for Sequential Budgeted Learning , 2021, ICML.
[4] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[5] Mykel J. Kochenderfer,et al. Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration , 2020, NeurIPS.
[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[7] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[8] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[9] Eyke Hüllermeier,et al. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..
[10] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.
[11] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[12] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[13] Haim Kaplan,et al. Online Markov Decision Processes with Aggregate Bandit Feedback , 2021, COLT.
[14] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[15] Jasjeet S. Sekhon,et al. Time-uniform, nonparametric, nonasymptotic confidence sequences , 2020, The Annals of Statistics.
[16] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[19] Louis Faury,et al. Self-Concordant Analysis of Generalized Linear Bandits with Forgetting , 2020, AISTATS.
[20] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[21] Demis Hassabis,et al. Improved protein structure prediction using potentials from deep learning , 2020, Nature.
[22] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[23] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[24] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[25] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[26] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[27] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[28] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[29] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[32] Shie Mannor,et al. Reinforcement Learning with Trajectory Feedback , 2020, ArXiv.
[33] Ruosong Wang,et al. Preference-based Reinforcement Learning with Finite-Time Guarantees , 2020, NeurIPS.
[34] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[35] J. Tropp. FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.
[36] Tengyu Ma,et al. On the Performance of Thompson Sampling on Logistic Bandits , 2019, COLT.
[37] Michèle Sebag,et al. Programming by Feedback , 2014, ICML.
[38] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[39] Krzysztof Choromanski,et al. On Optimism in Model-Based Reinforcement Learning , 2020, ArXiv.
[40] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[41] Joel W. Burdick,et al. Dueling Posterior Sampling for Preference-Based Reinforcement Learning , 2019, UAI.