A review of machine learning for automated planning

Recent discoveries in automated planning are broadening the scope of planners, from toy problems to real applications. However, applying automated planners to real-world problems is far from simple. On the one hand, the definition of accurate action models for planning is still a bottleneck. On the other hand, off-the-shelf planners fail to scale-up and to provide good solutions in many domains. In these problematic domains, planners can exploit domain-specific control knowledge to improve their performance in terms of both speed and quality of the solutions. However, manual definition of control knowledge is quite difficult. This paper reviews recent techniques in machine learning for the automatic definition of planning knowledge. It has been organized according to the target of the learning process: automatic definition of planning action models and automatic definition of planning control knowledge. In addition, the paper reviews the advances in the related field of reinforcement learning.

[1]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[2]  Jonathan Schaeffer,et al.  Fast Planning with Iterative Macros , 2007, IJCAI.

[3]  Pedro Meseguer,et al.  Improving LRTA*(k) , 2007, IJCAI.

[4]  Sergio Jiménez Celorrio,et al.  The PELA Architecture: Integrating Planning and Learning to Improve Execution , 2008, AAAI.

[5]  David W. Aha,et al.  Learning approximate preconditions for methods in hierarchical plans , 2005, ICML.

[6]  Robert P. Goldman,et al.  A Bayesian Model of Plan Recognition , 1993, Artif. Intell..

[7]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[8]  Robert Givan,et al.  Using Learned Policies in Heuristic-Search Planning , 2007, IJCAI.

[9]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[10]  Laurent Siklóssy,et al.  The Role of Preprocessing in Problem Solving Systems , 1977, IJCAI.

[11]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[12]  Jaime G. Carbonell,et al.  Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage, and Utilization , 1993, Machine Learning.

[13]  T. Bylander Complexity results for extended planning , 1992 .

[14]  Dana S. Nau,et al.  On the Complexity of Blocks-World Planning , 1992, Artif. Intell..

[15]  Ramón García-Martínez,et al.  An Integrated Approach of Learning, Planning, and Execution , 2000, J. Intell. Robotic Syst..

[16]  Daniel Borrajo,et al.  Combining Macro-operators with Control Knowledge , 2007, ILP.

[17]  Robert Givan,et al.  Learning Domain-Specific Control Knowledge from Random Walks , 2004, ICAPS.

[18]  John Levine,et al.  Learning Macro-Actions for Arbitrary Planners and Domains , 2007, ICAPS.

[19]  Daniel Borrajo Learning action durations from executions , 2007 .

[20]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[21]  Hector Muñoz-Avila,et al.  Learning to Do HTN Planning , 2006, ICAPS.

[22]  Stefan Edelkamp,et al.  Symbolic Pattern Databases in Heuristic Search Planning , 2002, AIPS.

[23]  Tara A. Estlin,et al.  Hybrid learning of search control for partial-order planning , 1996 .

[24]  James A. Hendler,et al.  A Validation-Structure-Based Theory of Plan Modification and Reuse , 1992, Artif. Intell..

[25]  David W. Aha,et al.  CaMeL: Learning Method Preconditions for HTN Planning , 2002, AIPS.

[26]  T. L. McCluskey,et al.  Combining Weak Learning Heuristics in General Problem Solvers , 1987, IJCAI.

[27]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[28]  Stephen Muggleton Bayesian Inductive Logic Programming , 1994, ICML.

[29]  Martijn van Otterlo,et al.  The Logic of Adaptive Behavior - Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains , 2009, Frontiers in Artificial Intelligence and Applications.

[30]  Jaime G. Carbonell,et al.  FERMI: A Flexible Expert Reasoner with Multi-Domain Inferencing , 1988, Cogn. Sci..

[31]  Dana S. Nau,et al.  Control Strategies in HTN Planning: Theory Versus Practice , 1998, AAAI/IAAI.

[32]  V. S. Subrahmanian,et al.  On the Complexity of Domain-Independent Planning , 1992, AAAI.

[33]  De,et al.  Relational Reinforcement Learning , 2022 .

[34]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[35]  Thomas J. Walsh,et al.  Efficient Learning of Action Schemas and Web-Service Descriptions , 2008, AAAI.

[36]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[37]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[38]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[39]  Hector Geffner,et al.  Probabilistic Plan Recognition Using Off-the-Shelf Classical Planners , 2010, AAAI.

[40]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[41]  Malte Helmert,et al.  Concise finite-domain representations for PDDL planning tasks , 2009, Artif. Intell..

[42]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[43]  William W. Cohen Learning Approximate Control Rules of High Utility , 1990, ML.

[44]  Hector Muñoz-Avila,et al.  HTN-MAKER: Learning HTNs with Minimal Additional Knowledge Engineering Required , 2008, AAAI.

[45]  Maurice Bruynooghe,et al.  Online Learning and Exploiting Relational Models in Reinforcement Learning , 2007, IJCAI.

[46]  Thomas Gärtner,et al.  Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[47]  Manfred Jaeger,et al.  Relational Bayesian Networks , 1997, UAI.

[48]  Ingrid Zukerman,et al.  Learning Search Control Rules for Planning: An Inductive Approach , 1991, ML.

[49]  S. Yoon Towards Model-lite Planning : A Proposal For Learning & Planning with Incomplete Domain Models , 2007 .

[50]  Stuart I. Reynolds Reinforcement Learning with Exploration , 2002 .

[51]  Terry L. Zimmerman,et al.  Learning-Assisted Automated Planning: Looking Back, Taking Stock, Going Forward , 2003, AI Mag..

[52]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[53]  Mark Steedman,et al.  Learning action effects in partially observable domains , 2010, ECAI.

[54]  Alfonso Gerevini,et al.  An Automatically Configurable Portfolio-based Planner with Macro-actions: PbP , 2009, ICAPS.

[55]  Kurt Driessens,et al.  Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[56]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[57]  Robert E. Kalaba,et al.  Dynamic Programming and Modern Control Theory , 1966 .

[58]  Amedeo Cesta,et al.  Evaluating Mixed-Initiative Systems: An Experimental Approach , 2006, ICAPS.

[59]  Grigorios Tsoumakas,et al.  HAPRC: an automatically configurable planning system , 2005, AI Commun..

[60]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[61]  Oren Etzioni,et al.  Acquiring Search-Control Knowledge via Static Analysis , 1993, Artif. Intell..

[62]  Sergio Jiménez Celorrio,et al.  Learning Relational Decision Trees for Guiding Heuristic Planning , 2008, ICAPS.

[63]  Luc De Raedt,et al.  Towards Combining Inductive Logic Programming with Bayesian Networks , 2001, ILP.

[64]  Fernando Fernández,et al.  A prototype-based method for classification with time constraints: a case study on automated planning , 2010, Pattern Analysis and Applications.

[65]  Eyal Amir,et al.  Learning Partially Observable Deterministic Action Models , 2005, IJCAI.

[66]  J. Shavlik Acquiring Recursive and Iterative Concepts with Explanation-Based Learning , 1990, Machine Learning.

[67]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[68]  Juan Fernández-Olivares,et al.  Bringing Users and Planning Technology Together. Experiences in SIADEX , 2006, ICAPS.

[69]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[70]  Hector Muñoz-Avila,et al.  Learning Hierarchical Task Networks for Nondeterministic Planning Domains , 2009, IJCAI.

[71]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[72]  Xuemei Wang,et al.  Learning Planning Operators by Observation and Practice , 1994, AIPS.

[73]  Andrew Coles,et al.  Marvin: A Heuristic Search Planner with Online Macro-Action Learning , 2011, J. Artif. Intell. Res..

[74]  Ralph Bergmann,et al.  PARIS: Flexible Plan Adaptation by Abstraction and Refinement , 1996 .

[75]  Scott Sanner,et al.  Symbolic Dynamic Programming for First-order POMDPs , 2010, AAAI.

[76]  Pedro Isasi Viñuela,et al.  Using genetic programming to learn and improve control knowledge , 2002, Artif. Intell..

[77]  M. Veloso,et al.  DISTILL : Towards Learning Domain-Specific Planners by Example , 2002 .

[78]  Bernhard Nebel,et al.  In Defense of PDDL Axioms , 2003, IJCAI.

[79]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[80]  Tomás de la Rosa,et al.  Three Relational Learning Approaches for Lookahead Heuristic Planning , 2009 .

[81]  Jorge A. Baier,et al.  Exploiting N-Gram Analysis to Predict Operator Sequences , 2009, ICAPS.

[82]  Álvaro Torralba,et al.  TIMIPLAN : An Application to Solve Multimodal Transportation Problems , 2010 .

[83]  Daniel Borrajo,et al.  Using Cases Utility for Heuristic Planning Improvement , 2007, ICCBR.

[84]  Alan Fern,et al.  Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[85]  David W. Aha,et al.  HICAP: An Interactive Case-Based Planning Architecture and its Application to Noncombatant Evacuation Operations , 1999, AAAI/IAAI.

[86]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[87]  S. Muggleton Stochastic Logic Programs , 1996 .

[88]  Manuela M. Veloso,et al.  DISTILL: Learning Domain-Specific Planners by Example , 2003, ICML.

[89]  Mark Steedman,et al.  Using Kernel Perceptrons to Learn Action Effects for Planning , 2008 .

[90]  Jaime G. Carbonell,et al.  Learning effective search control knowledge: an explanation-based approach , 1988 .

[91]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[92]  Yolanda Gil,et al.  Acquiring domain knowledge for planning by experimentation , 1992 .

[93]  James G Bellingham,et al.  Robotics in Remote and Hostile Environments , 2007, Science.

[94]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[95]  Jonathan Schaeffer,et al.  Learning Partial-Order Macros from Solutions , 2005, ICAPS.

[96]  Adele E. Howe,et al.  Exploiting Competitive Planner Performance , 1999, ECP.

[97]  Andrew G. Barto,et al.  Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.

[98]  R. M. Keller,et al.  The role of explicit contextual knowledge in learning concepts to improve performance , 1987 .

[99]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[100]  Herbert A. Simon,et al.  Rule Creation and Rule Learning Through Environmental Exploration , 1989, IJCAI.

[101]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[102]  Raquel Fuentetaja,et al.  Improving Control-Knowledge Acquisition for Planning by Active Learning , 2006, ECML.

[103]  Luc De Raedt,et al.  Logical and relational learning , 2008, Cognitive Technologies.

[104]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[105]  R. Mike Cameron-Jones,et al.  Induction of logic programs: FOIL and related systems , 1995, New Generation Computing.

[106]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[107]  Tom Bylander,et al.  Complexity Results for Planning , 1991, IJCAI.

[108]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[109]  V. Bulitko,et al.  Learning in Real-Time Search: A Unifying Framework , 2011, J. Artif. Intell. Res..

[110]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[111]  Raymond J. Mooney,et al.  Combining FOIL and EBG to Speed-up Logic Programs , 1993, IJCAI.

[112]  Paul R. Cohen,et al.  Searching for Planning Operators with Context-Dependent and Probabilistic Effects , 1996, AAAI/IAAI, Vol. 1.

[113]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[114]  T. L. McCluskey,et al.  Acquisition of Object-Centred Domain Models from Planning Examples , 2009, ICAPS.

[115]  Qiang Yang,et al.  Learning action models from plan examples using weighted MAX-SAT , 2007, Artif. Intell..

[116]  Frederic Py,et al.  Adaptive Control for Autonomous Underwater Vehicles , 2008, AAAI.

[117]  Scott Sanner,et al.  Approximate Linear Programming for First-order MDPs , 2005, UAI.

[118]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[119]  Neil Immerman,et al.  Learning Generalized Plans Using Abstract Counting , 2008, AAAI.

[120]  Robert Givan,et al.  Taxonomic syntax for first order inference , 1989, JACM.

[121]  Jörg Hoffmann,et al.  Ordered Landmarks in Planning , 2004, J. Artif. Intell. Res..

[122]  Patrik Haslum,et al.  Deterministic planning in the fifth international planning competition: PDDL3 and experimental evaluation of the planners , 2009, Artif. Intell..

[123]  Scott Sanner,et al.  Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[124]  Jonathan Schaeffer,et al.  Macro-FF: Improving AI Planning with Automatically Learned Macro-Operators , 2005, J. Artif. Intell. Res..

[125]  Ari K. Jónsson,et al.  Mixed-Initiative Activity Planning for Mars Rovers , 2005, IJCAI.

[126]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[127]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[128]  Subbarao Kambhampati,et al.  Model-lite Planning for the Web Age Masses: The Challenges of Planning with Incomplete and Evolving Domain Models , 2007, AAAI.

[129]  M. Veloso,et al.  Nonlinear Planning with Parallel Resource Allocation , 1990 .

[130]  George W. Ernst,et al.  GPS : a case study in generality and problem solving , 1971 .

[131]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[132]  Ivan Serina,et al.  Kernel functions for case-based planning , 2010, Artif. Intell..

[133]  Scott Sherwood Benson,et al.  Learning action models for reactive autonomous agents , 1996 .

[134]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[135]  James F. Allen,et al.  TRAINS-95: Towards a Mixed-Initiative Planning Assistant , 1996, AIPS.

[136]  P. Pandurang Nayak,et al.  Validating the DS-1 Remote Agent Experiment , 1999 .

[137]  James Cussens,et al.  Parameter Estimation in Stochastic Logic Programs , 2001, Machine Learning.

[138]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[139]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[140]  Hector Geffner,et al.  Plan Recognition as Planning , 2009, IJCAI.

[141]  Qiang Yang,et al.  Learning Action Models with Quantified Conditional Effects for Software Requirement Specification , 2008, ICIC.

[142]  Stefan Edelkamp,et al.  Automated Planning: Theory and Practice , 2007, Künstliche Intell..

[143]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[144]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[145]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[146]  Manuela M. Veloso,et al.  Lazy Incremental Learning of Control Knowledge for Efficiently Obtaining Quality Plans , 1997, Artificial Intelligence Review.

[147]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[148]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[149]  Maurice Bruynooghe,et al.  Learning Relational Options for Inductive Transfer in Relational Reinforcement Learning , 2007, ILP.