Learning to Rank for Synthesizing Planning Heuristics

We investigate learning heuristics for domain-specific planning. Prior work framed learning a heuristic as an ordinary regression problem. However, in a greedy best-first search, the ordering of states induced by a heuristic is more indicative of the resulting planner's performance than mean squared error. Thus, we instead frame learning a heuristic as a learning to rank problem which we solve using a RankSVM formulation. Additionally, we introduce new methods for computing features that capture temporal interactions in an approximate plan. Our experiments on recent International Planning Competition problems show that the RankSVM learned heuristics outperform both the original heuristics and heuristics learned through ordinary regression.

[1]  Sergio Jiménez Celorrio,et al.  A review of machine learning for automated planning , 2012, The Knowledge Engineering Review.

[2]  John Fox,et al.  The Knowledge Engineering Review , 1984, The Knowledge Engineering Review.

[3]  Alan Fern,et al.  Iterative Learning of Weighted Rule Sets for Greedy Search , 2010, ICAPS.

[4]  Ronald P. A. Petrick,et al.  Learning heuristic functions for cost-based planning , 2013 .

[5]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[6]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[7]  Dimitrios Gunopulos,et al.  Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , 2006, KDD 2006.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[10]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[11]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[12]  Wheeler Ruml,et al.  Building a Heuristic for Greedy Search , 2015, SOCS.

[13]  Alan Fern,et al.  Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[14]  J. Hoffmann,et al.  Where 'Ignoring Delete Lists' Works: Local Search Topology in Planning Benchmarks , 2005, J. Artif. Intell. Res..

[15]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[16]  Malte Helmert,et al.  Preferred Operators and Deferred Evaluation in Satisficing Planning , 2009, ICAPS.

[17]  Bernhard Nebel,et al.  COMPLEXITY RESULTS FOR SAS+ PLANNING , 1995, Comput. Intell..

[18]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Jörg Hoffmann Where Ignoring Delete Lists Works, Part II: Causal Graphs , 2011, ICAPS.

[21]  Sören Sonnenburg,et al.  Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[22]  Martin Müller,et al.  Action Elimination and Plan Neighborhood Graph Search: Two Algorithms for Plan Improvement , 2010, ICAPS.

[23]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[24]  Richard S. Sutton,et al.  Planning and Learning , 1998 .

[25]  Lukás Chrpa,et al.  The 2014 International Planning Competition: Progress and Trends , 2015, AI Mag..

[26]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[27]  Hector Geffner,et al.  Unifying the Causal Graph and Additive Heuristics , 2008, ICAPS.

[28]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[29]  Wheeler Ruml,et al.  When Does Weighted A* Fail? , 2012, SOCS.