Solving Hybrid Markov Decision Processes

Markov decision processes (MDPs) have developed as a standard for representing uncertainty in decision-theoretic planning. However, MDPs require an explicit representation of the state space and the probabilistic transition model which, in continuous or hybrid continuous-discrete domains, are not always easy to define. Even when this representation is available, the size of the state space and the number of state variables to consider in the transition function may be such that the resulting MDP cannot be solved using traditional techniques. In this paper a reward-based abstraction for solving hybrid MDPs is presented. In the proposed method, we gather information about the rewards and the dynamics of the system by exploring the environment. This information is used to build a decision tree (C4.5) representing a small set of abstract states with equivalent rewards, and then is used to learn a probabilistic transition function using a Bayesian networks learning algorithm (K2). The system output is a problem specification ready for its solution with traditional dynamic programming algorithms. We have tested our abstract MDP model approximation in real-world problem domains. We present the results in terms of the models learned and their solutions for different configurations showing that our approach produces fast solutions with satisfying policies.

[1]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[2]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[3]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[4]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[5]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[6]  Ivan Bratko,et al.  Qualitative reverse engineering , 2002, International Conference on Machine Learning.

[7]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[8]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[9]  Craig Boutilier,et al.  Continuous Value Function Approximation for Sequential Bidding Policies , 1999, UAI.

[10]  Moisés Goldszmidt,et al.  Action Networks: A Framework for Reasoning about Actions and Change under Uncertainty , 1994, UAI.

[11]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[12]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[13]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[14]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Joelle Pineau,et al.  Policy-contingent abstraction for robust robot control , 2002, UAI.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .