论文信息 - Learning to Solve Multiple Goals

Learning to Solve Multiple Goals

In many domains, the task can be decomposed into a set of independent sub-goals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. Thus, if the solution to one sub-goal is known when another sub-goal is in some given state, the known solution must be relearned when the status of the other sub-goal changes. This dissertation presents a modular approach to reinforcement learning that takes advantage of task decomposition to avoid unnecessary relearning. In the modular approach, modules are created to learn each sub-goal. Each module receives only those inputs relevant to its associated sub-goal, and can therefore learn without being affected by the state of other sub-goals. Furthermore, each module searches a much smaller space than that defined by all inputs considered together, thereby greatly reducing learning time. Since each module learns how to achieve a separate sub-goal, at any given time it may recommend an action different from that recommended by other modules. To select an action that best satisfies as many of the modules as possible, a simple arbitration strategy is used. One such strategy, explored in this dissertation, is called greatest mass which simply combines action utilities from all modules and selects the one with the largest combined utility. Since the modular approach limits and separates information given to the modules, the solution learned must necessarily differ from that learned by a standard, non-modular approach. However, experiments in a simple driving world indicate that while sub-optimal, the solution learned by the modular system only makes minor errors when compared with that learned by the standard approach. A complex task can thus be learned very quickly, using only small amounts of computational resources, with only small sacrifices in solution quality, using the modular approach.

Jonas Karlsson | Jonas Karlsson

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] Dorian Feldman. Contributions to the "Two-Armed Bandit" Problem , 1962 .

[3] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[4] Teddy Seidenfeld,et al. Calibration, Coherence, and Scoring Rules , 1985, Philosophy of Science.

[5] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[6] Philip E. Agre,et al. The dynamic structure of everyday life , 1988 .

[7] D. Ballard,et al. A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[8] Pattie Maes,et al. The Dynamics of Action Selection , 1989, IJCAI.

[9] Leslie Pack Kaelbling,et al. A Formal Framework for Learning in Embedded Systems , 1989, ML.

[10] Oussama Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[11] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.