The first learning track of the international planning competition

The International Planning Competition is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling. The 2008 competition included, for the first time, a learning track for comparing approaches for improving automated planners via learning. In this paper, we describe the structure of the learning track, the planning domains used for evaluation, the participating systems, the results, and our observations. Towards supporting the goal of domain-independent learning, one of the key features of the competition was to disallow any code changes or parameter tweaks after the training domains were revealed to the participants. The competition results show that at this stage no learning for planning system outperforms state-of-the-art planners in a domain independent manner across a wide range of domains. However, they appear to be close to providing such performance. Evaluating learning for planning systems in a blind competition raises important questions concerning criteria that should be taken into account in future competitions.

[1]  Kevin Leyton-Brown,et al.  : The Design and Analysis of an Algorithm Portfolio for SAT , 2007, CP.

[2]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[3]  Hector Geffner,et al.  Branching and pruning: An optimal temporal POCL planner based on constraint programming , 2004, Artif. Intell..

[4]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[5]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[6]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[7]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[8]  Pedro Isasi Viñuela,et al.  Using genetic programming to learn and improve control knowledge , 2002, Artif. Intell..

[9]  Steven Minton,et al.  Machine Learning Methods for Planning , 1994 .

[10]  Subbarao Kambhampati,et al.  AltAlt: Combining the Advantages of Graphplan and Heuristic State Search , 2000 .

[11]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[12]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[13]  Alfonso Gerevini,et al.  An Automatically Configurable Portfolio-based Planner with Macro-actions: PbP , 2009, ICAPS.

[14]  Jonathan Schaeffer,et al.  Sokoban: Enhancing general single-agent search methods using domain knowledge , 2001, Artif. Intell..

[15]  Steven Minton,et al.  Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[16]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[17]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[18]  Daniel Borrajo,et al.  Using Cases Utility for Heuristic Planning Improvement , 2007, ICCBR.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Roni Khardon,et al.  Policy Iteration for Relational MDPs , 2007, UAI.

[21]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[22]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[23]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[24]  Shimon Whiteson,et al.  The Reinforcement Learning Competitions , 2010 .

[25]  Tara A. Estlin,et al.  Multi-Strategy Learning of Search Control for Partial-Order Planning , 1996, AAAI/IAAI, Vol. 1.

[26]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[27]  Kurt Driessens,et al.  Relational Reinforcement Learning , 1998, Machine-mediated learning.

[28]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[29]  Roni Khardon,et al.  Stochastic Planning with First Order Decision Diagrams , 2008, ICAPS.

[30]  Alan Fern,et al.  Searching Solitaire in Real Time , 2007, J. Int. Comput. Games Assoc..

[31]  Silvia Richter The LAMA Planner Using Landmark Counting in Heuristic Search , 2009 .

[32]  A. Gerevini,et al.  A Planner Based on an Automatically Configurable Portfolio of Domain-independent Planners with Macro-actions : PbP Beniamino Galvani , 2008 .

[33]  Terry L. Zimmerman,et al.  Learning-Assisted Automated Planning: Looking Back, Taking Stock, Going Forward , 2003, AI Mag..

[34]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[35]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[36]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[37]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[38]  Bart Selman,et al.  Learning Declarative Control Rules for Constraint-BAsed Planning , 2000, ICML.

[39]  Håkan L. S. Younes Extending PDDL to Model Stochastic Decision Processes , 2003 .

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41]  Kristian Kersting,et al.  Self-Taught Decision Theoretic Planning with First Order Decision Diagrams , 2010, ICAPS.

[42]  Yixin Chen,et al.  Temporal Planning using Subgoal Partitioning and Resolution in SGPlan , 2006, J. Artif. Intell. Res..

[43]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.