A Continuous-Time Markov Decision Process-Based Method With Application in a Pursuit-Evasion Example

This paper presents a novel method-continuous-time Markov decision process (CTMDP)-to address the uncertainties in pursuit-evasion problem. The primary difference between the CTMDP and the Markov decision process (MDP) is that the former takes into account the influence of the transition time between the states. The policy iteration method-based potential performance for solving the CTMDP and its convergence are also presented. The results obtained by MDP-based method demonstrate that it is a special case of CTMDP-based method involving the identity transition rate matrix. To compare the methods, a well-known pursuit-evasion problem, involving two identical cars, is solved as a benchmark. The CTMDP-based method can provide a discretization solution that is close to the analytical solution obtained by the differential game method. Besides, it shows strong robustness against changes in the transition probability, as compared with the traditional MDP-based method. To the best of our knowledge, this is the first attempt to validate the influence of the transition time between the states in such a pursuit-evasion scenario, or in a similar application, solved by an MDP-related model. The CTMDP-based method offers a new approach to solving the pursuit-evasion problem and can be extended to similar optimization applications.

[1]  E. Cockayne Plane Pursuit with Curvature Constraints , 1967 .

[2]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  Antony W Merz,et al.  The Homicidal Chauffeur - A Differential Game , 1971 .

[4]  Pushkin Kachroo,et al.  Pursuit evasion: the herding noncooperative dynamic game - the stochastic model , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[5]  J.P. Hespanha,et al.  Multiple-agent probabilistic pursuit-evasion games , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[8]  Ian M. Mitchell,et al.  Games of Two Identical Vehicles , 2007 .

[9]  L. F. Bertuccelli,et al.  Robust Adaptive Markov Decision Processes: Planning with Model Uncertainty , 2012, IEEE Control Systems.

[10]  Zhu Huayong,et al.  A Continuous-Time Markov Decision Process Based Method on Pursuit-Evasion Problem , 2014 .

[11]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[12]  J. Shinar,et al.  Three-dimensional optimal pursuit and evasion with bounded controls , 1980 .

[13]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[14]  Jing-En Pang,et al.  Pursuit-Evasion with Acceleration, Sensing Limitation, and Electronic Counter Measures , 2007 .

[15]  Xianping Guo,et al.  Continuous-Time Markov Decision Processes: Theory and Applications , 2009 .

[16]  A. Merz The game of two identical cars , 1972 .

[17]  Nicholas Roy,et al.  Air-Combat Strategy Using Approximate Dynamic Programming , 2008 .

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[21]  Casper J. Erkelens,et al.  A model of the human smooth pursuit system based on an unsupervised adaptive controller , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[22]  S. Shankar Sastry,et al.  Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation , 2002, IEEE Trans. Robotics Autom..

[23]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.

[24]  M. Pachter,et al.  The effect of a finite roll rate on the miss-distance of a bank-to-turn missile , 1993 .

[25]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Touvia Miloh A note on Three-dimensional pursuit-evasion game with bounded curvature , 1982 .

[27]  S. Sastry,et al.  Probabilistic pursuit-evasion games: a one-step Nash approach , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).