11 Hierarchical Approaches to Concurrency , Multiagency , and Partial Observability

ion level: traversal abstraction level: primitiveion level: primitive Fig. 11.13 A hierarchical suffix memory state estimator for a robot navigation task. At the abstract (navigation) level, observations and decisions occur at intersections. At the lower (corridor-traversal) level, observations and decisions occur within the corridor. At each level, each agent constructs states out of its past experience with similar history (shown with shadows). Partially observable MDPs are theoretically more powerful than finite memory models, but past work on POMDPs has mostly studied “flat” models for which learning and planning algorithms scale poorly with model size. We have developed a newhierarchical POMDPframework termed H-POMDPs (see Figure 11.14) [42], by extending the hierarchical hidden Markov model (HHMM) [7] to include rewards, multiple entry/exit points into abstract states and (temporally extended) actions. H-POMDPs can also be represented as Dynamic Bayesian networks [43], in a similar way that HHMMs can be represented as DBNs [23]. Figure 11.15 shows a Dynamic Bayesian net representation of H-POMDPs. This model differs from the CONCURRENCY, MULTIAGENCY, AND PARTIAL OBSERVABILITY 303 Fig. 11.14 State transition diagram of a hierarchical POMDP used to model corridor environments. Large ovals represent abstract states; the small solid circles within them represent entry states, and the small hollow circles represent exit states. The small circles with arrows represent production states. Arcs represent non-zero transition probabilities as follows: Dotted arrows from concrete states represent concrete horizontal transitions, dashed arrows from exit states represent abstract horizontal transitions, and solid arrows from entry states represent vertical transitions. model described in [23] in two basic ways: the presence of action nodes A, and the fact that exit nodesX are no longer binary.

[1]  Leslie Pack Kaelbling,et al.  Representing hierarchical POMDPs as DBNs for multi-scale robot localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[2]  Sridhar Mahadevan,et al.  Approximate planning with hierarchical partially observable Markov decision process models for robot navigation , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[3]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[4]  Sridhar Mahadevan,et al.  Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.

[5]  Sridhar Mahadevan,et al.  Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.

[6]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[7]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[8]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[9]  Michael I. Jordan,et al.  Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[10]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[11]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[12]  Gang Wang,et al.  Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.

[13]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[14]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[15]  Victor R. Lesser,et al.  Learning to Improve Coordinated Actions in Cooperative Distributed Problem-Solving Environments , 1998, Machine Learning.

[16]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[17]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[18]  R. Simmons,et al.  Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models , 1998 .

[19]  Leslie Pack Kaelbling,et al.  Learning Topological Maps with Weak Local Odometric Information , 1997, IJCAI.

[20]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[21]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[22]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[23]  Craig A. Knoblock,et al.  An analysis of ABSTRIPS , 1992 .

[24]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[25]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[26]  Joelle Pineau,et al.  A Hierarchical Approach to POMDP Planning and Execution , 2004 .

[27]  M. Tan,et al.  Multi Agent Reinforcement Learning Independent vs Cooperative Agents , 2003 .

[28]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[29]  Sridhar Mahadevan,et al.  Hierarchical learning and planning in partially observable markov decision processes , 2002 .

[30]  Sridhar Mahadevan,et al.  Learning Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation , 2001 .

[31]  J. Adams Multiagent Systems: A Modern Approach to Dis- tributed Artificial Intelligence A Review , 2001 .

[32]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[33]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[34]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[35]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[36]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[37]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .