What Next for Learning in AI Planning

paper reports on a comprehensive survey of research work related to machine learning as applied to AI planning over the past 15 years. Major research contributions are characterized broadly by learning method and then into descriptive subcategories. Survey results reveal learning techniques that have been extensively applied and a number that have received scant attention. We extend the survey analysis to suggest promising avenues for future research in learning based on both previous experience and current needs in the planning community. In the early to mid -90's learning's main role in AI planning was to make up for often debilitating weaknesses in the planners themselves. The general purpose planning systems of even a decade ago struggled to solve simple problems in the classical benchmark domains; Blocksworld problems of 10 blocks lay beyond their capabilities as did most logistics problems. The planners of the period generally employed only weak guidance in traversing their search spaces, so it is not surprising that augmenting the systems to learn some such guidance often proved effective. Relative to the largely naive base planner the learning-enhanced systems demonstrated improvements in both the size of problems that could be addressed and the speed with which they could be solved. (Minton '89 Leckie,Zukerman '93 Veloso, Carbonell '93 Kambhampati, et al. '96) With the advent of several new genres of planning systems in the past 5 - 6 years the entire base performance level against which any learning-augmented system must compare has shifted dramatically. It's arguably a more difficult proposition to accelerate a planner in this generation by outfitting it with some form of online learning, as the overhead cost incurred by the learning system can overwhelm the gains in search efficiency. This, in part may explain why the planning community has paid less attention to learning in recent years, at least as a tool to facilitate faster problem solving. Of course, possible planning community interest in learning is not (and should not be) limited to speedup benefits that might be achieved. As AI planning has advanced its reach to the point where it can cross the threshold from 'toy' problems to some interesting real-world applications a range of issues comes into focus, from dealing with incomplete and uncertain environments to developing an effective interface with human users. What then, are the most promising roles for learning in our current generation of planning systems? This paper comprises an overview of the role of machine learning in AI planning in the past, its status at present, and some of the as yet, largely unexplored research directions.

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  Anthony G. Cohn,et al.  Automatically Synthesising Domain Constraints from Operator Descriptions , 1992, ECAI.

[3]  Dean A. Pomerleau,et al.  Knowledge-Based Training of Artificial Neural Networks for Autonomous Robot Driving , 1993 .

[4]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[5]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[6]  William W. Cohen Learning Approximate Control Rules of High Utility , 1990, ML.

[7]  Tom Michael Mitchell,et al.  Explanation-based generalization: A unifying view , 1986 .

[8]  Subbarao Kambhampati,et al.  Failure Driven Dynamic Search Control for Partial Order Planners: An Explanation Based Approach , 1996, Artif. Intell..

[9]  Fu,et al.  Integration of neural heuristics into knowledge-based inference , 1989 .

[10]  Tara A. Estlin,et al.  Multi-Strategy Learning of Search Control for Partial-Order Planning , 1996, AAAI/IAAI, Vol. 1.

[11]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[12]  Prasad Tadepalli,et al.  Learning from Queries and Examples with Tree-structured Bias , 1993, ICML.

[13]  Craig A. Knoblock,et al.  Learning Plan Rewriting Rules , 2000, AIPS.

[14]  Steven Minton,et al.  Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[15]  Tom Michael Mitchell Learning Analytically and Inductively , 1995 .

[16]  Gerald DeJong,et al.  Real-World Robotics: Learning to Plan for Robust Execution , 2005, Machine Learning.

[17]  Subbarao Kambhampati,et al.  Plan-space vs state-space planning in reuse and replay , 1996 .

[18]  Subbarao Kambhampati,et al.  Planning as constraint satisfaction: Solving the planning graph by compiling it into CSP , 2001, Artif. Intell..

[19]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[20]  Paul R. Cohen,et al.  Learning Planning Operators in Real-World, Partially Observable Environments , 2000, AIPS.

[21]  Gerald DeJong,et al.  COMPOSER: A Probabilistic Solution to the Utility Problem in Speed-Up Learning , 1992, AAAI.

[22]  Subbarao Kambhampati,et al.  Design and Implementation of a Replay Framework Based on a Partial Order Planner , 1996, AAAI/IAAI, Vol. 1.

[23]  Jack Mostow,et al.  Failsafe - A Floor Planner that Uses EBG to Learn from Its Failures , 1987, IJCAI.

[24]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[25]  Michael J. Pazzani,et al.  A Knowledge-intensive Approach to Learning Relational Concepts , 1991, ML.

[26]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[27]  Pedro Isasi Viñuela,et al.  Genetic Programming and Deductive-Inductive Learning: A Multi-Strategy Approach , 1998, ICML.

[28]  Maria Fox,et al.  The Automatic Inference of State Invariants in TIM , 1998, J. Artif. Intell. Res..

[29]  Raymond J. Mooney,et al.  Theory Refinement Combining Analytical and Empirical Methods , 1994, Artif. Intell..

[30]  Lenhart K. Schubert,et al.  Accelerating Partial-Order Planners: Some Techniques for Effective Search Control and Pruning , 1996, J. Artif. Intell. Res..

[31]  Prasad Tadepalli,et al.  Lazy ExplanationBased Learning: A Solution to the Intractable Theory Problem , 1989, IJCAI.

[32]  Alberto M. Segre,et al.  The Peaks and Valleys of ALPS: an Adaptive Learning and Planning System for Transportation Scheduling* , 1996 .

[33]  Jude Shavlik,et al.  An Approach to Combining Explanation-based and Neural Learning Algorithms , 1989 .

[34]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[35]  Xuemei Wang,et al.  A Multistrategy Learning System for Planning Operator Acquisition , 1996 .

[36]  Bart Selman,et al.  The Role of Domain-Specific Knowledge in the Planning as Satisfiability Framework , 1998, AIPS.

[37]  Bart Selman,et al.  Learning Declarative Control Rules for Constraint-BAsed Planning , 2000, ICML.

[38]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[39]  Subbarao Kambhampati,et al.  Planning Graph as a (Dynamic) CSP: Exploiting EBL, DDB and other CSP Search Techniques in Graphplan , 2000, J. Artif. Intell. Res..

[40]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[41]  Kevin D. Ashley,et al.  Reasoning with Reasons in Case-Based Comparisons , 1995, ICCBR.

[42]  Bernhard Nebel,et al.  Ignoring Irrelevant Facts and Operators in Plan Generation , 1997, ECP.

[43]  Maria Fox,et al.  The Detection and Exploitation of Symmetry in Planning Problems , 1999, IJCAI.

[44]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[45]  Jude W. Shavlik,et al.  Learning Symbolic Rules Using Artificial Neural Networks , 1993, ICML.

[46]  Karen Zita Haigh,et al.  Interleaving Planning and Robot Execution for Asynchronous User Requests , 1998, Auton. Robots.

[47]  Kedar Cabelli Explanation - based Generalization as resolution theorem proving , 1987 .

[48]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[49]  Lenhart K. Schubert,et al.  Inferring State Constraints for Domain-Independent Planning , 1998, AAAI/IAAI.

[50]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[51]  Ralph Bergmann,et al.  On the Role of Abstraction in Case-Based Reasoning , 1996, EWCBR.

[52]  Raymond J. Mooney,et al.  Combining FOIL and EBG to Speed-up Logic Programs , 1993, IJCAI.

[53]  Subbarao Kambhampati,et al.  Neural Network Guided Search Control in Partial Order Planning , 1996, AAAI/IAAI, Vol. 2.

[54]  Jussi Rintanen,et al.  An Iterative Algorithm for Synthesizing Invariants , 2000, AAAI/IAAI.

[55]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[56]  Henry Kautz,et al.  Blackbox: Unifying sat-based and graph-based planning , 1999, International Joint Conference on Artificial Intelligence.

[57]  Vincent Aleven,et al.  Reasoning Symbolically About Partially Matched Cases , 1997, IJCAI.

[58]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[59]  Ramón García-Martínez,et al.  An Integrated Approach of Learning, Planning, and Execution , 2000, J. Intell. Robotic Syst..

[60]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[61]  Ingrid Zukerman,et al.  Inductive Learning of Search Control Rules for Planning , 1998, Artif. Intell..

[62]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[63]  Subbarao Kambhampati,et al.  Storing and Indexing Plan Derivations through Explanation-based Analysis of Retrieval Failures , 1997, J. Artif. Intell. Res..

[64]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[65]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[66]  Pedro M. Domingos,et al.  Version Space Algebra and its Application to Programming by Demonstration , 2000, ICML.