Improving PAC Exploration Using the Median Of Means
暂无分享,去创建一个
[1] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[2] J. Harrison. Discrete Dynamic Programming with Unbounded Rewards , 1972 .
[3] Marco Aiello,et al. AAAI Conference on Artificial Intelligence , 2011, AAAI Conference on Artificial Intelligence.
[4] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[5] Jason Pazis,et al. Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates , 2016, AAAI.
[6] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[7] Noga Alon,et al. The Space Complexity of Approximating the Frequency Moments , 1999 .
[8] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[11] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[12] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Emma Brunskill,et al. Concurrent PAC RL , 2015, AAAI.
[16] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .
[17] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.