Unknown mixing times in apprenticeship and reinforcement learning
暂无分享,去创建一个
[1] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[2] Aryeh Kontorovich,et al. Estimating the Mixing Time of Ergodic Markov Chains , 2019, COLT.
[3] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[4] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[5] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[6] Haim Kaplan,et al. Apprenticeship Learning via Frank-Wolfe , 2019, AAAI.
[7] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[8] KearnsMichael,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002 .
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] David Bruce Wilson,et al. Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.
[11] Olle Häggström. Finite Markov Chains and Algorithmic Applications , 2002 .
[12] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.
[13] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[14] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[15] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[16] S. Karlin,et al. Studies in the Mathematical Theory of Inventory and Production, by K.J. Arrow, S. Karlin, H. Scarf with contributions by M.J. Beckmann, J. Gessford, R.F. Muth. Stanford, California, Stanford University Press, 1958, X p.340p., $ 8.75. , 1959, Bulletin de l'Institut de recherches économiques et sociales.
[17] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[18] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[19] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[22] Frank Kelly,et al. Networks of queues with customers of different types , 1975, Journal of Applied Probability.
[23] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[24] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[26] D. White. Dynamic programming, Markov chains, and the method of successive approximations , 1963 .
[27] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[28] David Bruce Wilson,et al. How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.
[29] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[30] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[31] V. Climenhaga. Markov chains and mixing times , 2013 .
[32] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .