论文信息 - Accelerating Search with Transferred Heuristics

Accelerating Search with Transferred Heuristics

A common goal for transfer learning research is to show that a learner can solve a source task and then leverage the learned knowledge to solve a target task faster than if it had learned the target task directly. A more difficult goal is to reduce the total training time so that learn- ing the source task and target task is faster than learning only the target task. This paper addresses the second goal by proposing a transfer hi- erarchy for 2-player games. Such a hierarchy orders games in terms of relative solution difficulty and can be used to select source tasks that are faster to learn than a given target task. We empirically test transfer between two types of tasks in the General Game Playing domain, the testbed for an international competition developed at Stanford. Our results show that transferring learned search heuristics from tasks in different parts of the hierarchy can significantly speed up search even when the source and target tasks differ along a number of important dimensions. task (called an auxiliary problem by Polya) is faster to solve than the target task, and the speedup in target task training time overcomes the time spent on learning the source task. To achieve this goal the learner must reason about all three steps. This paper takes a first step at the difficult problem of discovering appropriate source tasks by proposing a transfer hierarchy. Such a structure defines types of games that require more or less information to solve and thus may be used to order tasks by their relative solution complexity. Such an ordering can be used to identify source tasks that will take significantly less time to solve than a particular target task, re- ducing the impact of source task training on the total training time. In the future we hope that such a transfer hierarchy will be used to help automate the transfer learning process by assisting in the selec- tion of a source task for a given target task. In this paper we begin to evaluate the effectiveness of our proposed hierarchy by manu- ally constructing source tasks for a specified target task, where the selection of source task are motivated by the transfer hierarchy. To empirically demonstrate transfer between source and target task taken from our transfer hierarchy, we utilize the game of Mummy Maze. This game is an appropriate choice for two rea- sons. First, it has been released as a sample domain in the General Game Playing (Genesereth & Love 2005) (GGP) contest, an inter- national competition developed independently at Stanford. Second, the Mummy Maze task is easily modifiable so that it can conform to each task type in our transfer hierarchy. Our results show that a transferred heuristic is able to improve the speed of search by as much as 34%, meeting the target time goal, even if our source tasks differ from the target tasks along a number of dimensions. Addi- tionally, we demonstrate how the total training time goal may also be met for this particular pair of source and target types, depending on information gathering costs.

Peter Stone | Matthew E. Taylor | Gregory Kuhlmann | P. Stone | Gregory Kuhlmann

[1] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[2] Peter Stone,et al. Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[3] C. W. Tate. Solve it. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[4] Michael R. Genesereth,et al. General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[5] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[6] Michael R. Genesereth,et al. Knowledge Interchange Format , 1991, KR.

[7] Craig A. Knoblock. Automatically Generating Abstractions for Planning , 1994, Artif. Intell..

[8] Peter Stone,et al. Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[9] Peter Stone,et al. Learning to Solve Complex Planning Problems: Finding Useful Auxiliary Problems , 1994 .

[10] Raymond J. Mooney,et al. Using Active Relocation to Aid Reinforcement Learning , 2006, FLAIRS.

[11] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.