Discrepancy Search with Reactive Policies for Planning

We consider a novel use of mostly-correct reactive policies. In classical planning, reactive policy learning approaches could find good policies from solved trajectories of small problems and such policies have been successfully applied to larger problems. Often, due to the inductive nature, the learned reactive policies are mostly correct but commit errors on some portion of the states, rendering them useless in solving every problem of the domain. When the reward is only at goal states, the well known policy rollout approach cannot improve the performance of such faulty policies. Discrepancy Search has been developed in search to leverage the structural information of the heuristic functions which tends to be mostly-correct due to the human support. In this paper, we use reactive policies in discrepancy search for planning, in place of the heuristic functions. In our initial experiments, our proposed approach is effective in improving the performance of the given faulty reactive policies. The proposed approach outperformed the policy rollout as well as the reactive policies themselves. We will conclude with our research plan in the extension of the current proposal.

[1]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[2]  Robert Givan,et al.  Taxonomic syntax for first order inference , 1989, JACM.

[3]  Steven Minton,et al.  Machine Learning Methods for Planning , 1994 .

[4]  Matthew L. Ginsberg,et al.  Limited Discrepancy Search , 1995, IJCAI.

[5]  Eugene Fink,et al.  Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[6]  Richard E. Korf,et al.  Improved Limited Discrepancy Search , 1996, AAAI/IAAI, Vol. 1.

[7]  Tara A. Estlin,et al.  Multi-Strategy Learning of Search Control for Partial-Order Planning , 1996, AAAI/IAAI, Vol. 1.

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Drew McDermott,et al.  A Heuristic Estimator for Means-Ends Analysis in Planning , 1996, AIPS.

[10]  Toby Walsh Depth-bounded Discrepancy Search , 1997, IJCAI.

[11]  Blai Bonet,et al.  Planning as Heuristic Search: New Results , 1999, ECP.

[12]  Tania Bedrax-Weiss,et al.  Optimal search protocols , 1999 .

[13]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[14]  Bart Selman,et al.  Learning Declarative Control Rules for Constraint-BAsed Planning , 2000, ICML.

[15]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[16]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[17]  Muhammad Afzal Upal Learning Plan Rewriting Rules , 2001, FLAIRS.

[18]  Pedro Isasi Viñuela,et al.  Using genetic programming to learn and improve control knowledge , 2002, Artif. Intell..

[19]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[20]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[21]  Terry L. Zimmerman,et al.  Learning-Assisted Automated Planning: Looking Back, Taking Stock, Going Forward , 2003, AI Mag..

[22]  Benjamin Van Roy,et al.  Solitaire: Man Versus Machine , 2004, NIPS.

[23]  Roni Khardon,et al.  Learning to Take Actions , 1996, Machine Learning.

[24]  Robert Givan,et al.  Learning Domain-Specific Control Knowledge from Random Walks , 2004, ICAPS.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  J. Hoffmann,et al.  Where 'Ignoring Delete Lists' Works: Local Search Topology in Planning Benchmarks , 2005, J. Artif. Intell. Res..

[27]  Robert Givan,et al.  Learning Measures of Progress for Planning Domains , 2005, AAAI.

[28]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.