暂无分享,去创建一个
Lihong Li | Wei Wei | Christoph Dann | Emma Brunskill | Lihong Li | Christoph Dann | Emma Brunskill | Wei Wei | E. Brunskill
[1] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[2] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[3] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[4] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[5] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[6] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[7] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[8] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[9] Chong Li,et al. Model-Free Reinforcement Learning , 2019, Reinforcement Learning for Cyber-Physical Systems.
[10] Jon D. McAuliffe,et al. Uniform, nonparametric, non-asymptotic confidence sequences , 2018 .
[11] Yasin Abbasi-Yadkori,et al. Online learning in MDPs with side information , 2014, ArXiv.
[12] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[13] Aaron Roth,et al. Fair Learning in Markovian Environments , 2016, ArXiv.
[14] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[15] Maria-Florina Balcan,et al. The true sample complexity of active learning , 2010, Machine Learning.
[16] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[17] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[18] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[19] Shlomo Zilberstein,et al. Optimal Composition of Real-Time Systems , 1996, Artif. Intell..
[20] Sampath Kannan,et al. Fairness Incentives for Myopic Agents , 2017, EC.
[21] Aaron Roth,et al. Fairness in Reinforcement Learning , 2016, ICML.
[22] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[23] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[24] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[25] Martha White,et al. High-confidence error estimates for learned value functions , 2018, UAI.
[26] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[27] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[28] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[29] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[30] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[31] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[32] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[33] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .
[34] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[35] Nan Jiang,et al. On Polynomial Time PAC Reinforcement Learning with Rich Observations , 2018, ArXiv.
[36] Nan Jiang,et al. On Oracle-Efficient PAC Reinforcement Learning with Rich Observations , 2018 .
[37] Zhiwei Steven Wu,et al. The Externalities of Exploration and How Data Diversity Helps Exploitation , 2018, COLT.
[38] Shie Mannor,et al. Contextual Markov Decision Processes , 2015, ArXiv.
[39] Nan Jiang,et al. Markov Decision Processes with Continuous Side Information , 2017, ALT.
[40] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[41] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[42] Aaron Roth,et al. Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.
[43] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.