Shannon meets Bellman: Feature based Markovian models for detection and optimization

The goal of this paper is to develop modeling techniques for complex systems for the purposes of control, estimation, and inference: (i) A new class of hidden Markov models is introduced, called the optimal feature prediction (OFP) model. It is similar to the Gaussian mixture model in which the actual marginal distribution is used in place of a Gaussian distribution. This structure leads to simple learning algorithms to find an optimal model. (ii) The OFP model provides a unification of other modeling approaches including the projective methods of Shannon, Mori and Zwanzig, and Chorin, as well as a version of the binning technique for Markov model reduction. (iii) Several general applications are surveyed, including inference and optimal control. Computation of the spectrum, or solutions to dynamic programming equations are possible through a finite dimensional matrix calculation without knowledge of the underlying marginal distribution on which the model is based.

[1]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[2]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[3]  Francisco S. Melo,et al.  Convergence of Q-learning with linear function approximation , 2007, 2007 European Control Conference (ECC).

[4]  Sean P. Meyn Control Techniques for Complex Networks: Workload , 2007 .

[5]  Alexandre J. Chorin,et al.  Optimal prediction with memory , 2002 .

[6]  R. Zwanzig Nonequilibrium statistical mechanics , 2001, Physics Subject Headings (PhySH).

[7]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  V. Borkar Dynamic programming for ergodic control with partial observations , 2003 .

[9]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[10]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[11]  S. Meyn,et al.  Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.

[12]  E. Altman Constrained Markov Decision Processes , 1999 .

[13]  Leandros Tassiulas,et al.  Jointly optimal routing and scheduling in packet radio networks , 1992, IEEE Trans. Inf. Theory.

[14]  E. M.,et al.  Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Vivek S. Borkar,et al.  Average Cost Dynamic Programming Equations For Controlled Markov Chains With Partial Observations , 2000, SIAM J. Control. Optim..

[17]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[18]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[21]  Eitan Altman,et al.  Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[22]  A J Chorin,et al.  Optimal prediction and the Mori-Zwanzig representation of irreversible processes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[24]  P. Schweitzer Perturbation theory and finite Markov chains , 1968 .

[25]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[26]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[27]  M. Uschold,et al.  Methods and applications , 1953 .

[28]  Alexandre J. Chorin,et al.  Non-Markovian Optimal Prediction , 2001, Monte Carlo Methods Appl..

[29]  H. Mori Transport, Collective Motion, and Brownian Motion , 1965 .

[30]  Eugene A. Feinberg,et al.  Handbook of Markov Decision Processes , 2002 .

[31]  Sean P. Meyn The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..

[32]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[33]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.