论文信息 - Discrepancy Search with Reactive Policies for Planning

Discrepancy Search with Reactive Policies for Planning

We consider a novel use of mostly-correct reactive policies. In classical planning, reactive policy learning approaches could find good policies from solved trajectories of small problems and such policies have been successfully applied to larger problems. Often, due to the inductive nature, the learned reactive policies are mostly correct but commit errors on some portion of the states, rendering them useless in solving every problem of the domain. When the reward is only at goal states, the well known policy rollout approach cannot improve the performance of such faulty policies. Discrepancy Search has been developed in search to leverage the structural information of the heuristic functions which tends to be mostly-correct due to the human support. In this paper, we use reactive policies in discrepancy search for planning, in place of the heuristic functions. In our initial experiments, our proposed approach is effective in improving the performance of the given faulty reactive policies. The proposed approach outperformed the policy rollout as well as the reactive policies themselves. We will conclude with our research plan in the extension of the current proposal.

S. Yoon

[1] Oren Etzioni,et al. Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[2] Robert Givan,et al. Taxonomic syntax for first order inference , 1989, JACM.

[3] Steven Minton,et al. Machine Learning Methods for Planning , 1994 .

[4] Matthew L. Ginsberg,et al. Limited Discrepancy Search , 1995, IJCAI.

[5] Eugene Fink,et al. Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[6] Richard E. Korf,et al. Improved Limited Discrepancy Search , 1996, AAAI/IAAI, Vol. 1.

[7] Tara A. Estlin,et al. Multi-Strategy Learning of Search Control for Partial-Order Planning , 1996, AAAI/IAAI, Vol. 1.

[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9] Drew McDermott,et al. A Heuristic Estimator for Means-Ends Analysis in Planning , 1996, AIPS.

[10] Toby Walsh. Depth-bounded Discrepancy Search , 1997, IJCAI.

[11] Blai Bonet,et al. Planning as Heuristic Search: New Results , 1999, ECP.