Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward
暂无分享,去创建一个
Guillermo A. Pérez | Lukas Michel | Jan Kretínský | Fabian Michel | Jan Křetínský | G. Pérez | Fabian Michel | Lukas Michel
[1] David R. Karger,et al. Route Planning under Uncertainty: The Canadian Traveller Problem , 2008, AAAI.
[2] U. Rieder,et al. Markov Decision Processes , 2010 .
[3] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[4] Jan Kretínský,et al. PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games , 2019, CAV.
[5] Sven Schewe,et al. Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.
[6] Peter Winkler,et al. Exact Mixing in an Unknown Markov Chain , 1995, Electron. J. Comb..
[7] Krishnendu Chatterjee,et al. Verification of Markov Decision Processes Using Learning Algorithms , 2014, ATVA.
[8] Jean-François Raskin,et al. Safe and Optimal Scheduling for Hard and Soft Tasks , 2018, FSTTCS.
[9] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[10] V. Climenhaga. Markov chains and mixing times , 2013 .
[11] Hugo Gimbert,et al. Pure Stationary Optimal Strategies in Markov Decision Processes , 2007, STACS.
[12] Edmund M. Clarke,et al. Statistical Model Checking for Markov Decision Processes , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.
[13] Kim G. Larsen,et al. A modal process logic , 1988, [1988] Proceedings. Third Annual Information Symposium on Logic in Computer Science.
[14] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[15] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[16] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.
[17] David Bruce Wilson,et al. How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.
[18] Thomas A. Henzinger,et al. Faster Statistical Model Checking for Unbounded Temporal Properties , 2016, TACAS.
[19] Jan Kretínský,et al. Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints , 2018, CONCUR.
[20] Mathieu Tracol,et al. Fast convergence to state-action frequency polytopes for MDPs , 2009, Oper. Res. Lett..
[21] Elizabeth Gibney,et al. Google AI algorithm masters ancient game of Go , 2016, Nature.
[22] Mihalis Yannakakis,et al. Shortest Paths Without a Map , 1989, Theor. Comput. Sci..
[23] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[24] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[25] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[26] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[27] Christel Baier,et al. Principles of model checking , 2008 .
[28] Richard Lassaigne,et al. Approximate planning and verification for large Markov decision processes , 2012, SAC '12.