Model-Based Reinforcement Learning Exploiting State-Action Equivalence
暂无分享,去创建一个
Mohammad Sadegh Talebi | Odalric-Ambrym Maillard | Mahsa Asadi | Hippolyte Bourel | Odalric-Ambrym Maillard | M. S. Talebi | Hippolyte Bourel | Mahsa Asadi
[1] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .
[2] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[3] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] Odalric-Ambrym Maillard. Mathematics of Statistical Sequential Decision Making. (Mathématique de la prise de décision séquentielle statistique) , 2019 .
[6] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[7] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[8] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[9] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[10] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[11] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[14] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[15] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[16] Ronald Ortner,et al. Noname manuscript No. (will be inserted by the editor) Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes , 2022 .
[17] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.
[18] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[19] Doina Precup,et al. Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..
[20] Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
[21] Azadeh Khaleghi,et al. Consistent Algorithms for Clustering Time Series , 2016, J. Mach. Learn. Res..
[22] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[23] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[24] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[25] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[26] Zoran Popovic,et al. Efficient Bayesian Clustering for Reinforcement Learning , 2016, IJCAI.
[27] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[28] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[29] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[30] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[31] Parag Singla,et al. ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.
[32] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[33] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[34] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[35] Shie Mannor,et al. Model selection in markovian processes , 2013, KDD.