Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..
[3] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.
[4] Charles L. Forgy,et al. Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem , 1982, Artif. Intell..
[5] Richard E. Korf,et al. Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..
[6] Allen Newell,et al. SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..
[7] C. Watkins. Learning from delayed rewards , 1989 .
[8] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[9] Craig A. Knoblock. Learning Abstraction Hierarchies for Problem Solving , 1990, AAAI.
[10] Charles L. Forgy,et al. Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .
[11] Austin Tate,et al. O-Plan: The open Planning Architecture , 1991, Artif. Intell..
[12] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[13] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[14] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[15] Milind Tambe,et al. Investigating Production System Representations for Non-Combinatorial Match , 1994, Artif. Intell..
[16] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[17] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[18] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[19] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[20] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[21] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[22] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[23] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[24] Craig Boutilier,et al. Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.
[25] Thomas G. Dietterich,et al. Hierarchical Explanation-Based Reinforcement Learning , 1997, ICML.
[26] Craig Boutilier,et al. Economic Principles of Multi-Agent Systems , 1997, Artif. Intell..
[27] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[28] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[29] Csaba Szepesvari,et al. Module Based Reinforcement Learning for a Real Robot , 1997 .
[30] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[31] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .
[32] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[33] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .
[34] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[35] Ronald Parr,et al. Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.
[36] Balaraman Ravindran,et al. Improved Switching among Temporally Abstract Actions , 1998, NIPS.
[37] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[38] Andrew W. Moore,et al. Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.