Action Selection Methods Using Reinforcement Learning 1 Action Selection 1.1 Multi-module Reinforcement Learning

Action Selection schemes, when translated into precise algorithms, typically involve considerable design eeort and tuning of parameters. Little work has been done on solving the problem using learning. This paper compares eight diierent methods of solving the action selection problem using Reinforcement Learning (learning from rewards). The methods range from centralised and cooperative to decentralised and sellsh. They are tested in an artiicial world and their performance , memory requirements and reactiveness are compared. Finally, the possibility of more exotic , ecosystem-like decentralised models are considered. By Action Selection we do not mean the low-level problem of choice of action in pursuit of a single coherent goal. Rather we mean the higher-level problem of choice between connicting and heterogenous goals. These goals are pursued in parallel. They may sometimes combine to achieve larger-scale goals, but in general they simply interfere with each other. They may not have any terminating conditions. Typically, the action selection models proposed in ethology are not detailed enough to specify an algo-rithmic implementation (see Tyrrell, 1993] for a survey , and for some diiculties in translating the conceptual models into computational ones). The models that do lend themselves to algorithmic implementation) then typically require a considerable design eeort. In the literature, one sees formulas taking weighted sums of various quantities in an attempt to estimate the utility of actions. There is much hand-coding and tuning of parameters (e.) until the designer is satissed that the formulas deliver utility estimates that are fair. In fact, there may be a way that these utility values can come for free. Learning methods that automatically assign values to actions are common in the eld of Reinforcement Learning (RL) Kaelbling, 1993]. Reinforcement Learning propagates numeric rewards into behavior patterns. The rewards may be external value judgements , or just internally generated numbers. This paper compares eight diierent methods of further propagating these numbers to solve the action selection problem. The low-level problem of pursuing a single goal can be solved by straightforward RL, which assumes such a single goal. For the high-level problem of choice between connicting goals we try various methods exploiting the low-level RL numbers. In general, Reinforcement Learning work has concentrated on problems with a single goal. For complex problems, that need to be broken into subprob-lems, most of the work either designs the decomposition by hand Moore, 1990], or deals with problems where the sub-tasks have termination …