Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging

An ability to coordinate the pursuit of multiple, time-varying goals is important to an intelligent robot. In this chapter we consider the application of reinforcement learning to a simple class ofdynamicmulti-goal tasks.Not surprisingly, we find that the most straightforward, monolithic approach scales poorly, since the size of the state space is exponential in the number of goals. As an alternative, we propose a simple modular architecture which distributes the learning and control task amongst a set of separate control modules, one for each goal that the agent might encounter. Learning is facilitated since each module learns the optimal policy associated with its goal without regard for other current goals. This greatly simplifies the state representation and speeds learning time compared to a single monolithic controller. When the robot is faced with a single goal, the module associated with that goal is used to determine the overall control policy. When the robot is faced with multiple goals, information from each associated module is merged to determine the policy for the combined task. In general, these merged strategies yield good but suboptimal performance. Thus, the architecture trades poor initial performance, slow learning, and an optimal asymptotic policy in favor of good initial performance, fast learning, and a slightly sub-optimal asymptotic policy. We consider several merging strategies, from simple ones that compare and combine modular information about the current state only, to more sophisticated strategies that use lookahead search to construct more accurate utility estimates.