Conversion of MDP problems into heuristics based planning problems using temporal decomposition

This paper presents an approach for recasting Markov Decision Process (MDP) problems into heuristics based planning problems. The basic idea is to use temporal decomposition of the state space based on a subset of state space referred to as termination sample space. Specifically, the recasting of MDP problems is done in three steps. First step is to define a state space adaptation criterion based on the termination sample space. Second step is to define an action selection heuristic from each state. Third and final step is to define a recursion or backtracking methodology to avoid dead ends and infinite loops. All three steps have been described and discussed. A case study involving fault detection and alarm generation for the reaction wheels of a satellite mission has been discussed. The proposed approach has been compared with existing approaches for recasting MDP problems using the case study. Computational reduction achieved by the proposed approach is evident from the results.

[1]  Moshe Tennenholtz,et al.  Sequential decision making with vector outcomes , 2014, ITCS.

[2]  Angela J. Yu,et al.  Active Sensing as Bayes-Optimal Sequential Decision Making , 2013, UAI.

[3]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[4]  Andrea Lockerd Thomaz,et al.  Automatic task decomposition and state abstraction from demonstration , 2012, AAMAS.

[5]  Feng Wu,et al.  Online planning for large MDPs with MAXQ decomposition , 2012, AAMAS.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Mohammed Abbad,et al.  A decomposition algorithm for limiting average Markov decision problems , 2003, Oper. Res. Lett..

[8]  Warren B. Powell,et al.  What you should know about approximate dynamic programming , 2009, Naval Research Logistics (NRL).

[9]  David L. Akin,et al.  A SURVEY OF SERVICEABLE SPACECRAFT FAILURES , 2001 .

[10]  Ronen I. Brafman,et al.  Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[11]  I. V. Kolmanovsky,et al.  Conflict resolution and collaborative fault detection using stochastic dynamic programming , 2012, 2012 IEEE Aerospace Conference.

[12]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[13]  Brahim Chaib-draa,et al.  Decomposition techniques for a loosely-coupled resource allocation problem , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[14]  Dragan Bosnacki,et al.  GPU-Based Graph Decomposition into Strongly Connected and Maximal End Components , 2014, CAV.

[15]  Marc Toussaint,et al.  Probabilistic Inference Techniques for Scalable Multiagent Decision Making , 2015, J. Artif. Intell. Res..

[16]  Claudia V. Goldman,et al.  Communication-Based Decomposition Mechanisms for Decentralized MDPs , 2008, J. Artif. Intell. Res..

[17]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[18]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[19]  Simon X. Yang,et al.  Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition , 2011, IEEE Transactions on Neural Networks.

[20]  Philippe Laborie,et al.  Algorithms for propagating resource constraints in AI planning and scheduling: Existing approaches and new results , 2003, Artif. Intell..

[21]  Joseph H. Saleh,et al.  Satellite and satellite subsystems reliability: Statistical data analysis and modeling , 2009, Reliab. Eng. Syst. Saf..

[22]  Andrew G. Barto,et al.  A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.