暂无分享,去创建一个
Alessandro Lazaric | Ronan Fruit | Csaba Szepesvari | A. Lazaric | Gergely Neu | Vicenç Gómez | Ronan Fruit
[1] Shie Mannor,et al. Time-Regularized Interrupting Options (TRIO) , 2014, ICML.
[2] Nahum Shimkin,et al. Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] Scott Sanner,et al. Recent Advances in Reinforcement Learning - 9th European Workshop, EWRL 2011. , 2012 .
[5] Manfred Schäl,et al. On the Second Optimality Equation for Semi-Markov Decision Models , 1992, Math. Oper. Res..
[6] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[7] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.
[8] Balaraman Ravindran,et al. Options with Exceptions , 2011, EWRL.
[9] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[10] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[11] W. Gasarch,et al. The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .
[12] Satinder P. Singh,et al. Linear options , 2010, AAMAS.
[13] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[14] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[15] Paul J. Schweitzer,et al. Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards , 1983, Math. Oper. Res..
[16] Ambuj Tewari,et al. Bounded Parameter Markov Decision Processes with Average Reward Criterion , 2007, COLT.
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] Georges Zaccour,et al. Decision and control in management science , 2002 .
[19] Jianyong Liu,et al. On Average Reward Semi-Markov Decision Processes with a General Multichain Structure , 2004, Math. Oper. Res..
[20] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[21] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[22] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[23] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[24] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.