State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains

In a discrete time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and reward functions at different times may be different. In this paper, with the confluencity previously introduced, we show that the states of a TNHMC can be classified into the branching states and a number of classes of confluent states (versus the transient and recurrent states in the time homogeneous case). The optimization of average reward in TNHMC's consisting of a single confluent class (uni-chain) have been addressed in a previous paper by the author. In this paper, we show that with confluencity and the state classification and under some bound conditions, we can obtain the necessary and sufficient conditions for optimal policies of the average reward of TNHMCs consisting of multiple confluent classes (multi-chains). Just like in the uni-chain TNHMC case, the sufficient condition does not need to hold in any “zero frequently visited” time sequence. This “under-selectivity” makes the problem not amenable to dynamic programming. A direct comparison based approach is used to prove the results. The results enhance our understanding of state classification and performance optimization with the notion of confluencity.

[1]  A. Kolmogoroff Zur Theorie der Markoffschen Ketten , 1936 .

[2]  D. Blackwell Finite Non-Homogeneous Chains , 1945 .

[3]  M. Bartlett,et al.  Weak ergodicity in non-homogeneous Markov chains , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[5]  D. Griffeath Uniform coupling of non-homogeneous Markov chains , 1975, Journal of Applied Probability.

[6]  D. Griffeath A maximal coupling for Markov chains , 1975 .

[7]  J. Pitman On coupling of Markov chains , 1976 .

[8]  Robert L. Smith,et al.  A New Optimality Criterion for Nonhomogeneous Markov Decision Processes , 1987, Oper. Res..

[9]  H. Cohn PRODUCTS OF STOCHASTIC MATRICES AND APPLICATIONS , 1989 .

[10]  J. C. Bean,et al.  Denumerable state nonhomogeneous Markov decision processes , 1990 .

[11]  I. Sonin An Arbitrary Nonhomogeneous Markov Chain with Bounded Number of States May Be Decomposed into Asymptotically Noncommunicating Components Having the Mixing Property , 1992 .

[12]  Robert L. Smith,et al.  Optimal average value convergence in nonhomogeneous Markov decision processes Yunsun Park, James C. Bean and Robert L. Smith. , 1993 .

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  I. Sonin The asymptotic behaviour of a general finite nonhomogeneous Markov chain (the decomposition-separation theorem) , 1996 .

[15]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[16]  Xi-Ren Cao,et al.  BIAS OPTIMALITY FOR MULTICHAIN MARKOV DECISION PROCESSES , 2005 .

[17]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..

[18]  R. Madsen,et al.  The Decomposition-Separation Theorem for Finite Nonhomogeneous Markov Chains and Related Problems , 2007 .

[19]  Xi-Ren Cao,et al.  The $n$th-Order Bias Optimality for Multichain Markov Decision Processes , 2008, IEEE Transactions on Automatic Control.

[20]  L. Saloff-Coste,et al.  Merging and stability for time inhomogeneous finite Markov chains , 2010, 1004.2296.

[21]  Li Qiu,et al.  Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle , 2014, IEEE Transactions on Automatic Control.

[22]  Li Xia,et al.  A tutorial on event-based optimization—a new optimization framework , 2013, Discrete Event Dynamic Systems.

[23]  Xi-Ren Cao,et al.  Optimization of Average Rewards of Time Nonhomogeneous Markov Chains , 2015, IEEE Transactions on Automatic Control.