论文信息 - Learning Plans without a priori Knowledge

Learning Plans without a priori Knowledge

This paper is concerned with the autonomous learning of plans in probabilistic domains with out a priori domain-specific knowledge. In contrast to existing reinforcement learning algorithms that generate only reactive plans, and existing probabilistic planning algorithms that require a sub stantial amount of a priori knowledge in order to plan, a two-stage bottom-up process is devised in which first reinforcement learning/dynamic programming is applied, without the use of a pri ori domain-specific knowledge, to acquire a reactive plan, and then explicit plans are extracted from the reactive plan. Several options for plan extraction are examined, each of which is based on a beam search that performs temporal projection in a restricted fashion, guided by the value functions resulting from reinforcement learning/dynamic programming. Some completeness and soundness results are given. Examples in several domains are discussed that together demonstrate the working of the proposed model.

Ron Sun | Chad Sessions | R. Sun | C. Sessions

[1] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[2] Ron Sun,et al. Robust Reasoning: Integrating Rule-Based and Similarity-Based Reasoning , 1995, Artif. Intell..

[3] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[4] Ron Sun,et al. Learning, action and consciousness: a hybrid approach toward modelling consciousness , 1997, Neural Networks.

[5] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[7] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..

[8] Ron Sun,et al. Autonomous learning of sequential tasks: experiments and analyses , 1998, IEEE Trans. Neural Networks.

[9] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[10] Ron Sun,et al. Learning, Action, and Consciousness: a Hybrid Approach toward Modeling Consciousness Learning, Action, and Consciousness: a Hybrid Approach toward Modeling Consciousness , 1996 .

[11] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[12] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[13] Austin Tate,et al. Generating Project Networks , 1977, IJCAI.

[14] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[15] Michael L. Littman,et al. MAXPLAN: A New Approach to Probabilistic Planning , 1998, AIPS.

[16] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[17] Michael L. Littman,et al. Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[18] Jude W. Shavlik,et al. Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.

[19] Richard Fikes,et al. Odyssey: A Knowledge-Based Assistant , 1981, Artif. Intell..

[20] John R. Anderson. Acquisition of cognitive skill. , 1982 .

[21] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[22] Daniel S. Weld,et al. UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[23] R. Bellman. Dynamic programming. , 1957, Science.

[24] David H. D. Warren,et al. Generating Conditional Plans and Programs , 1976, AISB.

[25] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[26] Ron Sun,et al. Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.

[27] David Chapman,et al. What are plans for? , 1990, Robotics Auton. Syst..

[28] Daniel S. Weld,et al. Probabilistic Planning with Information Gathering and Contingent Execution , 1994, AIPS.

[29] Qiang Yang,et al. Characterizing Abstraction Hierarchies for Planning , 1991, AAAI.

[30] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[31] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32] Erann Gat,et al. Integrating Planning and Reacting in a Heterogeneous Asynchronous Architecture for Controlling Real-World Mobile Robots , 1992, AAAI.

[33] David Chapman,et al. Planning for Conjunctive Goals , 1987, Artif. Intell..

[34] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[35] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[36] Ron Sun,et al. Integrating rules and connectionism for robust commonsense reasoning , 1994, Sixth-generation computer technology series.

[37] Ronen I. Brafman,et al. Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[38] Michael P. Wellman,et al. Planning and Control , 1991 .

[39] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[40] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[41] John L. Bresina,et al. Anytime Synthetic Projection: Maximizing the Probability of Goal Satisfaction , 1990, AAAI.

[42] Jun Tani,et al. Model-based learning for mobile robot navigation from the dynamical systems perspective , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[43] Earl D. Sacerdott. Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[44] Marcel Schoppers,et al. Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[45] David A. McAllester,et al. Systematic Nonlinear Planning , 1991, AAAI.

[46] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.