Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

[1]  S. Haykin,et al.  A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[2]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions". In Advances in Neural Information Processing Systems , 1999 .

[3]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[4]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[5]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[6]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[7]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[8]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[9]  Chris Drummond,et al.  Composing Functions to Speed up Reinforcement Learning in a Changing World , 1998, ECML.

[10]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[11]  Blai Bonet High-Level Planning and Control with Incomplete Information Using POMDP's , 1998 .

[12]  R. Sutton,et al.  Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .

[13]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[14]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[15]  John N. Tsitsiklis,et al.  Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.

[16]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[17]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[18]  Ronen I. Brafman,et al.  Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[19]  Ronen I. Brafman,et al.  Modeling Agents as Qualitative Decision Makers , 1997, Artif. Intell..

[20]  Maja J. Mataric,et al.  Behaviour-based control: examples from navigation, learning, and group behaviour , 1997, J. Exp. Theor. Artif. Intell..

[21]  Maja J. Matari,et al.  Behavior-based Control: Examples from Navigation, Learning, and Group Behavior , 1997 .

[22]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[23]  Gerald DeJong,et al.  A Statistical Approach to Adaptive Problem Solving , 1996, Artif. Intell..

[24]  Leslie Pack Kaelbling,et al.  On reinforcement learning for robots , 1996, IROS.

[25]  Minoru Asada,et al.  Behavior coordination for a mobile robot using modular reinforcement learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[26]  Marco Colombetti,et al.  Behavior analysis and training-a methodology for behavior engineering , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[27]  Selahattin Kuru,et al.  Qualitative System Identification: Deriving Structure from Behavior , 1996, Artif. Intell..

[28]  Roderic A. Grupen,et al.  Learning Control Composition in a Complex Environment , 1996 .

[29]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[30]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[31]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[32]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[33]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[34]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[35]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[36]  David B. Leake,et al.  Quantitative Results Concerning the Utility of Explanation-Based Learning , 1995 .

[37]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[38]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[39]  Steve Ankuo Chien,et al.  A Statistical Approach to Adaptive Problem-Solving for Large-Scale Scheduling and Resource Allocation Problems , 1994 .

[40]  Gerald DeJong,et al.  Learning to Plan in Continuous Domains , 1994, Artif. Intell..

[41]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[42]  Nils J. Nilsson,et al.  Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[43]  L. Chrisman Reasoning About Probabilistic Actions At Multiple Levels of Granularity , 1994 .

[44]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[45]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[46]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[47]  Robert L. Grossman,et al.  Timed Automata , 1999, CAV.

[48]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[49]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[50]  Roger W. Brockett,et al.  Hybrid Models for Motion Control Systems , 1993 .

[51]  Marco C. Bettoni,et al.  Made-Up Minds: A Constructivist Approach to Artificial Intelligence , 1993, IEEE Expert.

[52]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[53]  Russell Greiner,et al.  A Statistical Approach to Solving the EBL Utility Problem , 1992, AAAI.

[54]  John R. Koza,et al.  Automatic Programming of Robots Using Genetic Programming , 1992, AAAI.

[55]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[56]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[57]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[58]  Satinder Singh The Ecient Learning of Multiple Task Sequences , 1992 .

[59]  Satinder P. Singh,et al.  The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[60]  Lambert E. Wixson,et al.  Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[61]  Gary L. Drescher,et al.  Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[62]  J. Urgen Schmidhuber,et al.  Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.

[63]  Oren Etzioni,et al.  Why PRODIGY/EBL Works , 1990, AAAI.

[64]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[65]  Steven Minton,et al.  Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[66]  Allen Newell,et al.  Chunking in Soar , 1986 .

[67]  J. Brown,et al.  A Qualitative Physics Based on Confluences , 1984, Artif. Intell..

[68]  R. Korf Learning to solve problems by searching for macro-operators , 1983 .

[69]  Benjamin Kuipers,et al.  Common-Sense Knowledge of Space: Learning from Experience , 1979, IJCAI.

[70]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[71]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[72]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[73]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[74]  R. Howard Dynamic Programming and Markov Processes , 1960 .