Bagging strategies for learning planning policies

In this paper we describe ENSEMBLE-ROLLER, a learning-based automated planner that uses a bagging approach to enhance existing techniques for learning planning policies. Previous policy-style planning and learning systems sort state successors based on action predictions from a relational classifier. However, these learning-based planners can produce several plans of bad quality, since it is very difficult to encode in a single classifier all possible situations occurring in a planning domain. We propose to use ensembles of relational classifiers to generate more robust policies. As in other applications of machine learning, the idea of the ensembles of classifiers consists of providing accuracy for particular scenarios and diversity to cover a wide range of situations. In particular, ENSEMBLE-ROLLER learns ensembles of relational decision trees for each planning domain. The control knowledge from different sets of trees is aggregated as a single prediction or applied separately in a multiple-queue search algorithm. Experimental results show that both ways of using new policies produce on average plans of better quality.

[1]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[2]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[3]  STEVEN MINTON,et al.  A reply to Zito-Wolf's book review ofLearning search control knowledge: An explanation-based approach , 2004, Machine Learning.

[4]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Sergio Jiménez Celorrio,et al.  A review of machine learning for automated planning , 2012, The Knowledge Engineering Review.

[7]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[8]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[9]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[10]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[11]  Raquel Fuentetaja,et al.  Scaling up Heuristic Planning with Relational Decision Trees , 2014, J. Artif. Intell. Res..

[12]  Jaime G. Carbonell,et al.  Learning effective search control knowledge: an explanation-based approach , 1988 .

[13]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Terry L. Zimmerman,et al.  Learning-Assisted Automated Planning: Looking Back, Taking Stock, Going Forward , 2003, AI Mag..

[16]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[17]  Malte Helmert,et al.  The More, the Merrier: Combining Heuristic Estimators for Satisficing Planning , 2010, ICAPS.

[18]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[19]  Olivier Buffet,et al.  Learning Pruning Rules for Heuristic Search Planning , 2014, ECAI.