Action Dependent State Space Abstraction for Hierarchical Learning Systems

To operate effectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this paper we outline a formulation of abstraction for reinforcement learning approaches to stochastic decision problems by extending one of the recent minimization models, known as ǫ-reduction. The technique presented here extends ǫ-reduction to SMDPs by executing a policy instead of a single action, and grouping all states which have a small difference in transition probabilities and reward function under a given policy. When the reward structure is not known or multiple tasks need to be learned on the same environments, a two-phase method for state aggregation is introduced and a theorem in this paper shows the solvability of tasks using the two-phase method partitions. Simulations of different state spaces show that the policies in both MDP and this representation achieve similar results and that the total learning time in the partition space is much smaller than the total amount of time spent on learning in the original state space.

[1]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[2]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[3]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[4]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[5]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Sridhar Mahadevan,et al.  A multiagent reinforcement learning algorithm by dynamically merging markov decision processes , 2002, AAMAS '02.

[8]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[9]  Victor Lesser,et al.  Formal Modeling of Communication Decisions in Cooperative Multi-agent Systems , 2004 .

[10]  R. Bellman Dynamic programming. , 1957, Science.

[11]  Chitta Baral,et al.  Reasoning about actions in a probabilistic setting , 2002, AAAI/IAAI.

[12]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[13]  Craig Boutilier,et al.  Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[14]  IT Kee-EungKim Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[15]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[16]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[17]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[18]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[19]  Mehran Asadi State Space Reduction for Hierarchical Policy Formation , 2003 .

[20]  Jussi Rintanen,et al.  Complexity of Probabilistic Planning under Average Rewards , 2001, IJCAI.

[21]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[22]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[23]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[24]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[25]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[27]  Manfred Huber,et al.  State Space Reduction For Hierarchical Reinforcement Learning , 2004, FLAIRS.

[28]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[29]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.