When Waiting is not an Option : Learning Options with a Deliberation Cost
暂无分享,去创建一个
Doina Precup | Pierre-Luc Bacon | Martin Klissarov | Jean Harb | Doina Precup | Pierre-Luc Bacon | J. Harb | Martin Klissarov
[1] H. Simon,et al. Models Of Man : Social And Rational , 1957 .
[2] H. Simon,et al. "Models of Man"@@@Models of Man: Social and Rational. Mathematical Essays on Rational Human Behavior in a Social Setting. , 1959 .
[3] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[4] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..
[5] Benjamin Kuipers,et al. Common-Sense Knowledge of Space: Learning from Experience , 1979, IJCAI.
[6] R. Korf. Learning to solve problems by searching for macro-operators , 1983 .
[7] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[8] A. Neyman. Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .
[9] Linn I. Sennott,et al. Constrained Discounted Markov Decision Chains , 1991, Probability in the Engineering and Informational Sciences.
[10] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .
[11] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[12] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[15] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[16] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[17] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[18] E. Altman. Constrained Markov Decision Processes , 1999 .
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] R. Selten,et al. Bounded rationality: The adaptive toolbox , 2000 .
[21] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[22] Glenn A. Iba,et al. A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.
[23] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[24] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.
[25] Regina Barzilay,et al. Learning High-Level Planning from Text , 2012, ACL.
[26] Alec Solway,et al. Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..
[27] Shie Mannor,et al. Time-Regularized Interrupting Options (TRIO) , 2014, ICML.
[28] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[29] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[30] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[33] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[34] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[35] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[36] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[37] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[38] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.
[39] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.