Learning Markov Models Via Low-Rank Optimization

Taming high-dimensional Markov models In “Learning Markov models via low-rank optimization”, Z. Zhu, X. Li, M. Wang, and A. Zhang focus on learning a high-dimensional Markov model with low-dimensional latent structure from a single trajectory of states. To overcome the curse of high dimensions, the authors propose to equip the standard MLE (maximum-likelihood estimation) with either nuclear norm regularization or rank constraint. They show that both approaches can estimate the full transition matrix accurately using a trajectory of length that is merely proportional to the number of states. To solve the rank-constrained MLE, which is a nonconvex problem, the authors develop a new DC (difference) programming algorithm. Finally, they apply the proposed methods to analyze taxi trips on the Manhattan island and partition the island based on the destination preference of customers; this partition can help balance supply and demand of taxi service and optimize the allocation of traffic resources.

[1]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[2]  David F. Gleich,et al.  The Spacey Random Walk: A Stochastic Process for Higher-Order Data , 2016, SIAM Rev..

[3]  Le Thi Hoai An,et al.  Exact penalty and error bounds in DC programming , 2012, J. Glob. Optim..

[4]  Hoai An Le Thi,et al.  DC programming and DCA: thirty years of developments , 2018, Mathematical Programming.

[5]  K. Marton Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration , 1996 .

[6]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[7]  Kim-Chuan Toh,et al.  An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming , 2015, Mathematical Programming.

[8]  Mikhail Posypkin,et al.  Optimization and Applications , 2018, Communications in Computer and Information Science.

[9]  Lin F. Yang,et al.  Online Factorization and Partition of Complex Networks From Random Walks , 2017 .

[10]  Defeng Sun,et al.  A Majorized Penalty Approach for Calibrating Rank Constrained Correlation Matrix Problems , 2010 .

[11]  C. Lemaréchal,et al.  Optimization and Applications , 2005 .

[12]  Kim-Chuan Toh,et al.  A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions , 2014, Mathematical Programming.

[13]  Yanjun Han,et al.  Minimax Estimation of Discrete Distributions Under $\ell _{1}$ Loss , 2014, IEEE Transactions on Information Theory.

[14]  R. Adamczak A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[15]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[16]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[17]  Kim-Chuan Toh,et al.  A Majorized ADMM with Indefinite Proximal Terms for Linearly Constrained Convex Composite Optimization , 2014, SIAM J. Optim..

[18]  Rebecca Willett,et al.  Inference of High-dimensional Autoregressive Generalized Linear Models , 2016, ArXiv.

[19]  Yanjun Han,et al.  Minimax Estimation of Discrete Distributions under ℓ1 Loss , 2014, ArXiv.

[20]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[21]  Amit Singer,et al.  Semidefinite programming approach for the quadratic assignment problem with a sparse graph , 2017, Computational Optimization and Applications.

[22]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[23]  Xin Jiang,et al.  Minimax Optimal Rates for Poisson Inverse Problems With Physical Constraints , 2014, IEEE Transactions on Information Theory.

[24]  Qiang Sun,et al.  Bernstein's inequality for general Markov chains , 2018, 1805.10721.

[25]  Kun Deng,et al.  Model reduction of Markov chains via low-rank approximation , 2012, 2012 American Control Conference (ACC).

[26]  Le Thi Hoai An,et al.  Stochastic DCA for the Large-sum of Non-convex Functions Problem and its Application to Group Variable Selection in Classification , 2017, ICML.

[27]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[28]  Anru Zhang,et al.  Spectral State Compression of Markov Processes , 2018, IEEE Transactions on Information Theory.

[29]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning in Rich-Observation MDPs using Spectral Methods , 2016, 1611.03907.

[30]  J. A. Fill Eigenvalue bounds on convergence to stationarity for nonreversible markov chains , 1991 .

[31]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[32]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Le Thi Hoai An,et al.  DC programming and DCA: thirty years of developments , 2018, Math. Program..

[35]  D. Vere-Jones Markov Chains , 1972, Nature.

[36]  H. Zou,et al.  Another look at distance‐weighted discrimination , 2018 .

[37]  Bo Wen,et al.  A proximal difference-of-convex algorithm with extrapolation , 2016, Computational Optimization and Applications.

[38]  Liguo Jiao,et al.  Convergence Analysis of Algorithms for DC Programming , 2015, 1508.03899.

[39]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[40]  Jianqing Fan,et al.  I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. , 2015, Annals of statistics.

[41]  Mark E. J. Newman,et al.  Spectral methods for network community detection and graph partitioning , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Qingqing Huang,et al.  Recovering Structured Probability Matrices , 2016, ITCS.

[43]  K. Dill,et al.  Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[44]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[45]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[46]  Yang Cao,et al.  Poisson Matrix Recovery and Completion , 2015, IEEE Transactions on Signal Processing.

[47]  Ronald R. Coifman,et al.  Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems , 2008, Multiscale Model. Simul..

[48]  Sean P. Meyn,et al.  Optimal Kullback-Leibler Aggregation via Spectral Theory of Markov Chains , 2011, IEEE Transactions on Automatic Control.

[49]  P. Buchholz Exact and ordinary lumpability in finite Markov chains , 1994, Journal of Applied Probability.

[50]  Kim-Chuan Toh,et al.  A partial proximal point algorithm for nuclear norm regularized matrix least squares problems , 2014, Math. Program. Comput..

[51]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[52]  Devavrat Shah,et al.  Rank Centrality: Ranking from Pairwise Comparisons , 2012, Oper. Res..

[53]  Tiejun Li,et al.  Optimal partition and effective dynamics of complex networks , 2008, Proceedings of the National Academy of Sciences.

[54]  G. Alistair Watson,et al.  On matrix approximation problems with Ky Fank norms , 1993, Numerical Algorithms.

[55]  J. Tropp FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.

[56]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[57]  H. Steinhaus The Problem of Estimation , 1957 .

[58]  Yuan Tian,et al.  Understanding intra-urban trip patterns from taxi trajectory data , 2012, Journal of Geographical Systems.

[59]  Kim-Chuan Toh,et al.  Fast Algorithms for Large-Scale Generalized Distance Weighted Discrimination , 2016, 1604.05473.