The Expected-Length Model of Options
暂无分享,去创建一个
Marie desJardins | Michael L. Littman | John Winder | David Abel | M. Littman | David Abel | J. Winder | Marie desJardins
[1] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[2] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.
[3] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[6] Leslie Pack Kaelbling,et al. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..
[7] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[9] Doina Precup,et al. Learning with Options that Terminate Off-Policy , 2017, AAAI.
[10] Stefanie Tellex,et al. Planning with Abstract Markov Decision Processes , 2017, ICAPS.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[14] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[15] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[16] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[17] Marie desJardins,et al. Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.
[18] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.
[19] Doina Precup,et al. Multi-Time Models for Reinforcement Learning , 2007 .
[20] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .
[21] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[22] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[23] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[24] Dock Bumpers,et al. Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..
[25] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[26] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[27] TWO-WEEK Loan COpy,et al. University of California , 1886, The American journal of dental science.