Action Selection schemes, when translated into precise algorithms, typically involve considerable design eeort and tuning of parameters. Little work has been done on solving the problem using learning. This paper compares eight diierent methods of solving the action selection problem using Reinforcement Learning (learning from rewards). The methods range from centralised and cooperative to decentralised and sellsh. They are tested in an artiicial world and their performance , memory requirements and reactiveness are compared. Finally, the possibility of more exotic , ecosystem-like decentralised models are considered. By Action Selection we do not mean the low-level problem of choice of action in pursuit of a single coherent goal. Rather we mean the higher-level problem of choice between connicting and heterogenous goals. These goals are pursued in parallel. They may sometimes combine to achieve larger-scale goals, but in general they simply interfere with each other. They may not have any terminating conditions. Typically, the action selection models proposed in ethology are not detailed enough to specify an algo-rithmic implementation (see Tyrrell, 1993] for a survey , and for some diiculties in translating the conceptual models into computational ones). The models that do lend themselves to algorithmic implementation) then typically require a considerable design eeort. In the literature, one sees formulas taking weighted sums of various quantities in an attempt to estimate the utility of actions. There is much hand-coding and tuning of parameters (e.) until the designer is satissed that the formulas deliver utility estimates that are fair. In fact, there may be a way that these utility values can come for free. Learning methods that automatically assign values to actions are common in the eld of Reinforcement Learning (RL) Kaelbling, 1993]. Reinforcement Learning propagates numeric rewards into behavior patterns. The rewards may be external value judgements , or just internally generated numbers. This paper compares eight diierent methods of further propagating these numbers to solve the action selection problem. The low-level problem of pursuing a single goal can be solved by straightforward RL, which assumes such a single goal. For the high-level problem of choice between connicting goals we try various methods exploiting the low-level RL numbers. In general, Reinforcement Learning work has concentrated on problems with a single goal. For complex problems, that need to be broken into subprob-lems, most of the work either designs the decomposition by hand Moore, 1990], or deals with problems where the sub-tasks have termination …
[1]
Mark Humphrys.
Action Selection in a hypothetical house robot: Using those RL numbers
,
1996
.
[2]
G. Reeke.
Marvin Minsky, The Society of Mind
,
1991,
Artif. Intell..
[3]
TesauroGerald.
Practical Issues in Temporal Difference Learning
,
1992
.
[4]
Richard W. Prager,et al.
A Modular Q-Learning Architecture for Manipulator Task Decomposition
,
1994,
ICML.
[5]
Mahesan Niranjan,et al.
On-line Q-learning using connectionist systems
,
1994
.
[6]
Michael K. Sahota.
Action selection for robots in dynamic environments through inter-behaviour bidding
,
1994
.
[7]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[8]
Bruce Blumberg,et al.
Action-selection in hamsterdam: lessons from ethology
,
1994
.
[9]
Leslie Pack Kaelbling,et al.
Learning in embedded systems
,
1993
.
[10]
Toby Tyrrell,et al.
Computational mechanisms for action selection
,
1993
.
[11]
G. F. Tremblay,et al.
Bright Air, Brilliant Fire: On the Matter of the Mind, Gerald M. Edelman. 1992. Basic Books, New York, NY. 280 pages. ISBN: 0-465-05245-2. $25.00
,
1992
.
[12]
R. M. Siegel,et al.
Foundations of Cognitive Science
,
1990,
Journal of Cognitive Neuroscience.
[13]
Long Ji Lin,et al.
Scaling Up Reinforcement Learning for Robot Control
,
1993,
International Conference on Machine Learning.
[14]
Thomas S. Ray,et al.
An Approach to the Synthesis of Life
,
1991
.