Reduction of Markov Chains Using a Value-of-Information-Based Approach

In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative, modified Kullback–Leibler divergence defined on a model joint space. Model reduction is achieved by solving a value-of-information criterion with respect to this divergence. Optimizing the criterion leads to a probabilistic partitioning of the states in the high-order Markov chain. A single free parameter that emerges through the optimization process dictates both the partition uncertainty and the number of state groups. We provide a data-driven means of choosing the ‘optimal’ value of this free parameter, which sidesteps needing to a priori know the number of state groups in an arbitrary chain.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  Qing Zhang,et al.  Controlled Markov Chains with Weak and Strong Interactions: Asymptotic Optimality and Applications to Manufacturing , 1997 .

[3]  Nicky J Welton,et al.  Value of Information , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  Qing-Shan Jia,et al.  On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes , 2011, IEEE Transactions on Automatic Control.

[5]  Zhiyuan Ren,et al.  Markov decision Processes with fractional costs , 2005, IEEE Transactions on Automatic Control.

[6]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[7]  François Delebecque,et al.  Optimal control of markov chains admitting strong and weak interactions , 1981, Autom..

[8]  Joe H. Chow,et al.  Time scale modeling of sparse dynamic networks , 1985 .

[9]  Stefano Panzeri,et al.  The Upward Bias in Measures of Information Derived from Limited Data Samples , 1995, Neural Computation.

[10]  P. Kokotovic,et al.  A singular perturbation approach to modeling and control of Markov chains , 1981 .

[11]  Heinz Koeppl,et al.  Optimal Kullback–Leibler Aggregation via Information Bottleneck , 2013, IEEE Transactions on Automatic Control.

[12]  Subhrakanti Dey,et al.  Reduced-complexity filtering for partially observed nearly completely decomposable Markov chains , 2000, IEEE Trans. Signal Process..

[13]  Petar V. Kokotovic,et al.  Weak connections, time scales, and aggregation of nonlinear systems , 1983 .

[14]  Peter B. Luh,et al.  Incremental Value Iteration for Time-Aggregated Markov-Decision Processes , 2007, IEEE Transactions on Automatic Control.

[15]  P. Courtois Error Analysis in Nearly-Completely Decomposable Stochastic Systems , 1975 .

[16]  Pierre-Jacques Courtois,et al.  Decomposability, instabilities, and saturation in multiprogramming systems , 1975, CACM.

[17]  William J. Stewart,et al.  Iterative aggregation/disaggregation techniques for nearly uncoupled markov chains , 1985, JACM.

[18]  W. Stewart,et al.  ITERATIVE METHODS FOR COMPUTING STATIONARY DISTRIBUTIONS OF NEARLY COMPLETELY DECOMPOSABLE MARKOV CHAINS , 1984 .

[19]  V. G. Gaitsgori,et al.  Aggregation of states in a Markov chain with weak interaction , 1975 .

[20]  Mathukumalli Vidyasagar,et al.  Reduced-order modeling of Markov and hidden Markov processes via aggregation , 2010, 49th IEEE Conference on Decision and Control (CDC).

[21]  B. Sridhar,et al.  Control of weakly-coupled Markov chains , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[22]  G. P. Barker,et al.  Convergent iterations for computing stationary distributions of markov , 1986 .

[23]  José Carlos Príncipe,et al.  An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits , 2017, Entropy.

[24]  A. A. Pervozvanskiĭ,et al.  Stationary-state evaluation for a complex system with slowly varying couplings , 1974 .

[25]  R. W. Aldhaheri,et al.  Aggregation and optimal control of nearly completely decomposable Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[26]  Jerzy A. Filar,et al.  Control of singularly perturbed hybrid stochastic systems , 2001, IEEE Trans. Autom. Control..

[27]  Sean P. Meyn,et al.  Optimal Kullback-Leibler Aggregation via Spectral Theory of Markov Chains , 2011, IEEE Transactions on Automatic Control.

[28]  M. Aoki Some approximation methods for estimation and control of large scale systems , 1978 .

[29]  Kun Deng,et al.  Model reduction of Markov chains via low-rank approximation , 2012, 2012 American Control Conference (ACC).

[30]  Tugrul Dayar,et al.  On the Effects of Using the Grassmann-Taksar-Heyman Method in Iterative Aggregation-Disaggregation , 1996, SIAM J. Sci. Comput..

[31]  José C. Príncipe,et al.  Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Mathukumalli Vidyasagar A Metric Between Probability Distributions on Finite Sets of Different Cardinalities and Applications to Order Reduction , 2012, IEEE Transactions on Automatic Control.

[33]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[34]  Munther A. Dahleh,et al.  Model reduction of irreducible Markov chains , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[35]  Fady Alajaji,et al.  The Kullback-Leibler divergence rate between Markov sources , 2004, IEEE Transactions on Information Theory.

[36]  José Carlos Príncipe,et al.  Analysis of Agent Expertise in Ms. Pac-Man Using Value-of-Information-Based Policies , 2017, IEEE Transactions on Games.

[37]  Hendrik Vantilborgh,et al.  Aggregation with an error of O(ε2) , 1985, JACM.

[38]  Sean P. Meyn,et al.  An information-theoretic framework to aggregate a Markov chain , 2009, 2009 American Control Conference.

[39]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[40]  H. Khalil,et al.  Aggregation of the policy iteration method for nearly completely decomposable Markov chains , 1991 .

[41]  Marcelo D. Fragoso,et al.  Standard dynamic programming applied to time aggregated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[42]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[43]  Tosio Kato Perturbation theory for linear operators , 1966 .

[44]  P.-J. Courtois Instabilities and Saturation in Multiprogramming Systems , 1977 .