Upper Confidence Reinforcement Learning exploiting state-action equivalence
暂无分享,去创建一个
[1] Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
[2] Shie Mannor,et al. Model selection in markovian processes , 2013, KDD.
[3] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .
[4] Doina Precup,et al. Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.
[5] Ronald Ortner,et al. Noname manuscript No. (will be inserted by the editor) Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes , 2022 .
[6] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[7] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[8] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[9] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[10] Parag Singla,et al. ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.
[11] Sarah Filippi,et al. Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[12] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[13] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[14] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[15] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[16] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[17] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[18] Azadeh Khaleghi,et al. Online Clustering of Processes , 2012, AISTATS.
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.