论文信息 - Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning - 字舞流文

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

R. Sutton | Satinder Singh | Doina Precup | Richard S. Suttona | Doina Precupb | Satinder Singha

[1] S. Haykin,et al. A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[2] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[3] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[4] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[5] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[6] Chris Drummond,et al. Composing Functions to Speed up Reinforcement Learning in a Changing World , 1998, ECML.

[7] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[8] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[9] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[10] John N. Tsitsiklis,et al. Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.

[11] Roderic A. Grupen,et al. A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[12] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..

[13] Ronen I. Brafman,et al. Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[14] Ronen I. Brafman,et al. Modeling Agents as Qualitative Decision Makers , 1997, Artif. Intell..

[15] Maja J. Mataric,et al. Behaviour-based control: examples from navigation, learning, and group behaviour , 1997, J. Exp. Theor. Artif. Intell..

[16] Satinder Singh,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[17] Gerald DeJong,et al. A Statistical Approach to Adaptive Problem Solving , 1996, Artif. Intell..

[18] Minoru Asada,et al. Behavior coordination for a mobile robot using modular reinforcement learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[19] Leslie Pack Kaelbling,et al. On reinforcement learning for robots , 1996, IROS.

[20] Marco Colombetti,et al. Behavior analysis and training-a methodology for behavior engineering , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[21] Selahattin Kuru,et al. Qualitative System Identification: Deriving Structure from Behavior , 1996, Artif. Intell..

[22] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[23] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[24] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[25] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[26] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[27] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[28] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[29] Steve Ankuo Chien,et al. A Statistical Approach to Adaptive Problem-Solving for Large-Scale Scheduling and Resource Allocation Problems , 1994 .

[30] Gerald DeJong,et al. Learning to Plan in Continuous Domains , 1994, Artif. Intell..

[31] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[32] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[33] Robert L. Grossman,et al. Timed Automata , 1999, CAV.

[34] Nils J. Nilsson,et al. Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[35] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[36] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[37] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[38] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[39] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[40] John R. Koza,et al. Automatic Programming of Robots Using Genetic Programming , 1992, AAAI.

[41] Russell Greiner,et al. A Statistical Approach to Solving the EBL Utility Problem , 1992, AAAI.

[42] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[43] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[44] Lambert E. Wixson,et al. Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[45] Oren Etzioni,et al. Why PRODIGY/EBL Works , 1990, AAAI.

[46] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.

[47] Steven Minton,et al. Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[48] J. Brown,et al. A Qualitative Physics Based on Confluences , 1984, Artif. Intell..

[49] Benjamin Kuipers,et al. Common-Sense Knowledge of Space: Learning from Experience , 1979, IJCAI.

[50] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[51] Allen Newell,et al. Human Problem Solving. , 1973 .

[52] R. Howard. Dynamic Programming and Markov Processes , 1960 .

[53] R. Sutton,et al. Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .

[54] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[55] R. Sutton,et al. Improved Switching among Temporally Abstract Actions , 1998 .

[56] Blai Bonet. High-Level Planning and Control with Incomplete Information Using POMDP's , 1998 .

[57] Maja J. Matari,et al. Behavior-based Control: Examples from Navigation, Learning, and Group Behavior , 1997 .

[58] Roderic A. Grupen,et al. Learning Control Composition in a Complex Environment , 1996 .

[59] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[60] David B. Leake,et al. Quantitative Results Concerning the Utility of Explanation-Based Learning , 1995 .

[61] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[62] Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1994, Neural Computation.

[63] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[64] L. Chrisman. Reasoning About Probabilistic Actions At Multiple Levels of Granularity , 1994 .

[65] Roger W. Brockett,et al. Hybrid Models for Motion Control Systems , 1993 .

[66] Marco C. Bettoni,et al. Made-Up Minds: A Constructivist Approach to Artificial Intelligence , 1993, IEEE Expert.

[67] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[68] Satinder Singh. The Ecient Learning of Multiple Task Sequences , 1992 .

[69] J. Urgen Schmidhuber,et al. Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.

[70] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[71] Allen Newell,et al. Chunking in Soar , 1986 .

[72] R. Korf. Learning to solve problems by searching for macro-operators , 1983 .