Hierarchical Strategy Learning with Hybrid Representations

Good problem solving knowledge for real life domains is hard to define in a single representation. In some situations, a direct policy is a better choice while in others, value function is better. Typically, direct policy representation is better suited to strategic level plans, while value function representation is better suited to tactical level plans. We propose a hybrid hierarchical representation machine (HHRM) where direct policy representation and value function based representation can co-exist in a level-wise fashion. We provide simple learning and planning algorithms with our new representation and discuss their application to Airspace Deconfliction domain. In our experiments, we provided our system LSP with two level HHRM for the domain. LSP could successfully learn from limited number of experts’ solution traces and show superior performance compared to average of human novice learners.

[1]  Hector Muñoz-Avila,et al.  SHOP: Simple Hierarchical Ordered Planner , 1999, IJCAI.

[2]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[3]  Eduardo F. Morales,et al.  Learning to fly by combining reinforcement learning with behavioural cloning , 2004, ICML.

[4]  C. Guestrin,et al.  Solving Factored MDPs with Hybrid State and Action Variables , 2006, J. Artif. Intell. Res..

[5]  David W. Aha,et al.  CaMeL: Learning Method Preconditions for HTN Planning , 2002, AIPS.

[6]  Robert Givan,et al.  Learning Measures of Progress for Planning Domains , 2005, AAAI.

[7]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[8]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[9]  David J. Musliner,et al.  A Framework for Planning in Continuous-time Stochastic Domains , 2003, ICAPS.

[10]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[11]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[12]  Robert Givan,et al.  Taxonomic syntax for first order inference , 1989, JACM.

[13]  James A. Hendler,et al.  Complexity results for HTN planning , 1994, Annals of Mathematics and Artificial Intelligence.

[14]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[15]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[18]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[19]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..