Learning Plans without a priori Knowledge

This paper is concerned with the autonomous learning of plans in probabilistic domains with out a priori domain-specific knowledge. In contrast to existing reinforcement learning algorithms that generate only reactive plans, and existing probabilistic planning algorithms that require a sub stantial amount of a priori knowledge in order to plan, a two-stage bottom-up process is devised in which first reinforcement learning/dynamic programming is applied, without the use of a pri ori domain-specific knowledge, to acquire a reactive plan, and then explicit plans are extracted from the reactive plan. Several options for plan extraction are examined, each of which is based on a beam search that performs temporal projection in a restricted fashion, guided by the value functions resulting from reinforcement learning/dynamic programming. Some completeness and soundness results are given. Examples in several domains are discussed that together demonstrate the working of the proposed model.

[1]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[2]  Ron Sun,et al.  Robust Reasoning: Integrating Rule-Based and Similarity-Based Reasoning , 1995, Artif. Intell..

[3]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[4]  Ron Sun,et al.  Learning, action and consciousness: a hybrid approach toward modelling consciousness , 1997, Neural Networks.

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[7]  Etienne Barnard,et al.  Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..

[8]  Ron Sun,et al.  Autonomous learning of sequential tasks: experiments and analyses , 1998, IEEE Trans. Neural Networks.

[9]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[10]  Ron Sun,et al.  Learning, Action, and Consciousness: a Hybrid Approach toward Modeling Consciousness Learning, Action, and Consciousness: a Hybrid Approach toward Modeling Consciousness , 1996 .

[11]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[12]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[13]  Austin Tate,et al.  Generating Project Networks , 1977, IJCAI.

[14]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[15]  Michael L. Littman,et al.  MAXPLAN: A New Approach to Probabilistic Planning , 1998, AIPS.

[16]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[17]  Michael L. Littman,et al.  Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[18]  Jude W. Shavlik,et al.  Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.

[19]  Richard Fikes,et al.  Odyssey: A Knowledge-Based Assistant , 1981, Artif. Intell..

[20]  John R. Anderson Acquisition of cognitive skill. , 1982 .

[21]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[22]  Daniel S. Weld,et al.  UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[23]  R. Bellman Dynamic programming. , 1957, Science.

[24]  David H. D. Warren,et al.  Generating Conditional Plans and Programs , 1976, AISB.

[25]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[26]  Ron Sun,et al.  Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[27]  David Chapman,et al.  What are plans for? , 1990, Robotics Auton. Syst..

[28]  Daniel S. Weld,et al.  Probabilistic Planning with Information Gathering and Contingent Execution , 1994, AIPS.

[29]  Qiang Yang,et al.  Characterizing Abstraction Hierarchies for Planning , 1991, AAAI.

[30]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[31]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32]  Erann Gat,et al.  Integrating Planning and Reacting in a Heterogeneous Asynchronous Architecture for Controlling Real-World Mobile Robots , 1992, AAAI.

[33]  David Chapman,et al.  Planning for Conjunctive Goals , 1987, Artif. Intell..

[34]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[35]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[36]  Ron Sun,et al.  Integrating rules and connectionism for robust commonsense reasoning , 1994, Sixth-generation computer technology series.

[37]  Ronen I. Brafman,et al.  Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[38]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[39]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[40]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[41]  John L. Bresina,et al.  Anytime Synthetic Projection: Maximizing the Probability of Goal Satisfaction , 1990, AAAI.

[42]  Jun Tani,et al.  Model-based learning for mobile robot navigation from the dynamical systems perspective , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[43]  Earl D. Sacerdott Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[44]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[45]  David A. McAllester,et al.  Systematic Nonlinear Planning , 1991, AAAI.

[46]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.