论文信息 - Reduction of Markov Chains Using a Value-of-Information-Based Approach

Reduction of Markov Chains Using a Value-of-Information-Based Approach

In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative, modified Kullback–Leibler divergence defined on a model joint space. Model reduction is achieved by solving a value-of-information criterion with respect to this divergence. Optimizing the criterion leads to a probabilistic partitioning of the states in the high-order Markov chain. A single free parameter that emerges through the optimization process dictates both the partition uncertainty and the number of state groups. We provide a data-driven means of choosing the ‘optimal’ value of this free parameter, which sidesteps needing to a priori know the number of state groups in an arbitrary chain.

José Carlos Príncipe | Isaac J. Sledge | J. Príncipe | I. Sledge

[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[2] Qing Zhang,et al. Controlled Markov Chains with Weak and Strong Interactions: Asymptotic Optimality and Applications to Manufacturing , 1997 .

[3] Nicky J Welton,et al. Value of Information , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.

[4] Qing-Shan Jia,et al. On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes , 2011, IEEE Transactions on Automatic Control.

[5] Zhiyuan Ren,et al. Markov decision Processes with fractional costs , 2005, IEEE Transactions on Automatic Control.

[6] Deniz Erdogmus,et al. Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[7] François Delebecque,et al. Optimal control of markov chains admitting strong and weak interactions , 1981, Autom..

[8] Joe H. Chow,et al. Time scale modeling of sparse dynamic networks , 1985 .

[9] Stefano Panzeri,et al. The Upward Bias in Measures of Information Derived from Limited Data Samples , 1995, Neural Computation.

[10] P. Kokotovic,et al. A singular perturbation approach to modeling and control of Markov chains , 1981 .

[11] Heinz Koeppl,et al. Optimal Kullback–Leibler Aggregation via Information Bottleneck , 2013, IEEE Transactions on Automatic Control.

[12] Subhrakanti Dey,et al. Reduced-complexity filtering for partially observed nearly completely decomposable Markov chains , 2000, IEEE Trans. Signal Process..

[13] Petar V. Kokotovic,et al. Weak connections, time scales, and aggregation of nonlinear systems , 1983 .

[14] Peter B. Luh,et al. Incremental Value Iteration for Time-Aggregated Markov-Decision Processes , 2007, IEEE Transactions on Automatic Control.

[15] P. Courtois. Error Analysis in Nearly-Completely Decomposable Stochastic Systems , 1975 .

[16] Pierre-Jacques Courtois,et al. Decomposability, instabilities, and saturation in multiprogramming systems , 1975, CACM.

[17] William J. Stewart,et al. Iterative aggregation/disaggregation techniques for nearly uncoupled markov chains , 1985, JACM.

[18] W. Stewart,et al. ITERATIVE METHODS FOR COMPUTING STATIONARY DISTRIBUTIONS OF NEARLY COMPLETELY DECOMPOSABLE MARKOV CHAINS , 1984 .

[19] V. G. Gaitsgori,et al. Aggregation of states in a Markov chain with weak interaction , 1975 .

[20] Mathukumalli Vidyasagar,et al. Reduced-order modeling of Markov and hidden Markov processes via aggregation , 2010, 49th IEEE Conference on Decision and Control (CDC).

[21] B. Sridhar,et al. Control of weakly-coupled Markov chains , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[22] G. P. Barker,et al. Convergent iterations for computing stationary distributions of markov , 1986 .

[23] José Carlos Príncipe,et al. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits , 2017, Entropy.

[24] A. A. Pervozvanskiĭ,et al. Stationary-state evaluation for a complex system with slowly varying couplings , 1974 .

[25] R. W. Aldhaheri,et al. Aggregation and optimal control of nearly completely decomposable Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[26] Jerzy A. Filar,et al. Control of singularly perturbed hybrid stochastic systems , 2001, IEEE Trans. Autom. Control..

[27] Sean P. Meyn,et al. Optimal Kullback-Leibler Aggregation via Spectral Theory of Markov Chains , 2011, IEEE Transactions on Automatic Control.

[28] M. Aoki. Some approximation methods for estimation and control of large scale systems , 1978 .

[29] Kun Deng,et al. Model reduction of Markov chains via low-rank approximation , 2012, 2012 American Control Conference (ACC).

[30] Tugrul Dayar,et al. On the Effects of Using the Grassmann-Taksar-Heyman Method in Iterative Aggregation-Disaggregation , 1996, SIAM J. Sci. Comput..

[31] José C. Príncipe,et al. Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32] Mathukumalli Vidyasagar. A Metric Between Probability Distributions on Finite Sets of Different Cardinalities and Applications to Order Reduction , 2012, IEEE Transactions on Automatic Control.

[33] S. Varadhan,et al. Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[34] Munther A. Dahleh,et al. Model reduction of irreducible Markov chains , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[35] Fady Alajaji,et al. The Kullback-Leibler divergence rate between Markov sources , 2004, IEEE Transactions on Information Theory.

[36] José Carlos Príncipe,et al. Analysis of Agent Expertise in Ms. Pac-Man Using Value-of-Information-Based Policies , 2017, IEEE Transactions on Games.

[37] Hendrik Vantilborgh,et al. Aggregation with an error of O(ε2) , 1985, JACM.

[38] Sean P. Meyn,et al. An information-theoretic framework to aggregate a Markov chain , 2009, 2009 American Control Conference.

[39] Herbert A. Simon,et al. Aggregation of Variables in Dynamic Systems , 1961 .

[40] H. Khalil,et al. Aggregation of the policy iteration method for nearly completely decomposable Markov chains , 1991 .

[41] Marcelo D. Fragoso,et al. Standard dynamic programming applied to time aggregated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[42] E. M. L. Beale,et al. Nonlinear Programming: A Unified Approach. , 1970 .

[43] Tosio Kato. Perturbation theory for linear operators , 1966 .

[44] P.-J. Courtois. Instabilities and Saturation in Multiprogramming Systems , 1977 .