暂无分享,去创建一个
[1] Peter Stone,et al. Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree , 2011, ICML.
[2] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[3] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[4] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[5] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[6] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[7] Christos Dimitrakakis,et al. Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities , 2019, ArXiv.
[8] Sarah Filippi,et al. Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[9] Martin L. Puterman,et al. A probabilistic analysis of bias optimality in unichain Markov decision processes , 2001, IEEE Trans. Autom. Control..
[10] Haipeng Luo,et al. Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits , 2016, NIPS.
[11] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[12] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[13] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[14] Haipeng Luo,et al. Efficient Contextual Bandits in Non-stationary Worlds , 2017, COLT.
[15] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[16] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[17] Zoubin Ghahramani,et al. Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.
[18] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[19] Alexander L. Strehl,et al. Model-Based Reinforcement Learning in Factored-State MDPs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[20] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[21] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[22] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.
[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[24] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.
[25] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[26] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[27] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.
[28] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.