Learning to Solve Multiple Goals

In many domains, the task can be decomposed into a set of independent sub-goals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. Thus, if the solution to one sub-goal is known when another sub-goal is in some given state, the known solution must be relearned when the status of the other sub-goal changes. This dissertation presents a modular approach to reinforcement learning that takes advantage of task decomposition to avoid unnecessary relearning. In the modular approach, modules are created to learn each sub-goal. Each module receives only those inputs relevant to its associated sub-goal, and can therefore learn without being affected by the state of other sub-goals. Furthermore, each module searches a much smaller space than that defined by all inputs considered together, thereby greatly reducing learning time. Since each module learns how to achieve a separate sub-goal, at any given time it may recommend an action different from that recommended by other modules. To select an action that best satisfies as many of the modules as possible, a simple arbitration strategy is used. One such strategy, explored in this dissertation, is called greatest mass which simply combines action utilities from all modules and selects the one with the largest combined utility. Since the modular approach limits and separates information given to the modules, the solution learned must necessarily differ from that learned by a standard, non-modular approach. However, experiments in a simple driving world indicate that while sub-optimal, the solution learned by the modular system only makes minor errors when compared with that learned by the standard approach. A complex task can thus be learned very quickly, using only small amounts of computational resources, with only small sacrifices in solution quality, using the modular approach.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  Dorian Feldman Contributions to the "Two-Armed Bandit" Problem , 1962 .

[3]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[4]  Teddy Seidenfeld,et al.  Calibration, Coherence, and Scoring Rules , 1985, Philosophy of Science.

[5]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[6]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[7]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[8]  Pattie Maes,et al.  The Dynamics of Action Selection , 1989, IJCAI.

[9]  Leslie Pack Kaelbling,et al.  A Formal Framework for Learning in Embedded Systems , 1989, ML.

[10]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[11]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[12]  Ronald C. Arkin,et al.  Integrating behavioral, perceptual, and world knowledge in reactive navigation , 1990, Robotics Auton. Syst..

[13]  James A. Hendler,et al.  Knowledge strata: reactive planning with a multi-level architecture , 1990 .

[14]  Michael A. Arbib,et al.  Depth and detours: an essay on visually guided behavior , 1990 .

[15]  David S. Johnson,et al.  Local Optimization and the Traveling Salesman Problem , 1990, ICALP.

[16]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[17]  Lambert E. Wixson,et al.  Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[18]  S. Thrun Eecient Exploration in Reinforcement Learning , 1992 .

[19]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[20]  R. A. McCallum First Results with Utile Distinction Memory for Reinforcement Learning , 1992 .

[21]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[22]  Steven Douglas Whitehead,et al.  Reinforcement learning for the adaptive control of perception and action , 1992 .

[23]  Qiang Yang,et al.  Theory and Algorithms for Plan Merging , 1992, Artif. Intell..

[24]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[25]  Jonas Karlsson,et al.  Learning via task decomposition , 1993 .

[26]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[27]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[28]  Shumeet Baluja,et al.  Advances in Neural Information Processing , 1994 .

[29]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[30]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[31]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[32]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .