论文信息 - The Expected-Length Model of Options - 字舞流文

The Expected-Length Model of Options

Effective options can make reinforcement learning easier by enhancing an agent’s ability to both explore in a targeted manner and plan further into the future. However, learning an appropriate model of an option’s dynamics in hard, requiring estimating a highly parameterized probability distribution. This paper introduces and motivates the ExpectedLength Model (ELM) for options, an alternate model for transition dynamics. We prove ELM is a (biased) estimator of the traditional MultiTime Model (MTM), but provide a non-vacuous bound on their deviation. We further prove that, in stochastic shortest path problems, ELM induces a value function that is sufficiently similar to the one induced by MTM, and is thus capable of supporting near-optimal behavior. We explore the practical utility of this option model experimentally, finding consistent support for the thesis that ELM is a suitable replacement for MTM. In some cases, we find ELM leads to more sample efficient learning, especially when options are arranged in a hierarchy.

Marie desJardins | Michael L. Littman | John Winder | David Abel | M. Littman | David Abel | J. Winder | Marie desJardins

[1] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .

[2] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[3] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[6] Leslie Pack Kaelbling,et al. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[7] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[9] Doina Precup,et al. Learning with Options that Terminate Off-Policy , 2017, AAAI.

[10] Stefanie Tellex,et al. Planning with Abstract Markov Decision Processes , 2017, ICAPS.

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[13] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.

[14] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.

[15] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.

[17] Marie desJardins,et al. Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.

[18] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.

[19] Doina Precup,et al. Multi-Time Models for Reinforcement Learning , 2007 .

[20] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[21] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[22] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[23] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[24] Dock Bumpers,et al. Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[25] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[26] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[27] TWO-WEEK Loan COpy,et al. University of California , 1886, The American journal of dental science.