论文信息 - Learning Markov Models Via Low-Rank Optimization

Learning Markov Models Via Low-Rank Optimization

Taming high-dimensional Markov models In “Learning Markov models via low-rank optimization”, Z. Zhu, X. Li, M. Wang, and A. Zhang focus on learning a high-dimensional Markov model with low-dimensional latent structure from a single trajectory of states. To overcome the curse of high dimensions, the authors propose to equip the standard MLE (maximum-likelihood estimation) with either nuclear norm regularization or rank constraint. They show that both approaches can estimate the full transition matrix accurately using a trajectory of length that is merely proportional to the number of states. To solve the rank-constrained MLE, which is a nonconvex problem, the authors develop a new DC (difference) programming algorithm. Finally, they apply the proposed methods to analyze taxi trips on the Manhattan island and partition the island based on the destination preference of customers; this partition can help balance supply and demand of taxi service and optimize the allocation of traffic resources.

[1] Bingsheng He,et al. The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[2] David F. Gleich,et al. The Spacey Random Walk: A Stochastic Process for Higher-Order Data , 2016, SIAM Rev..

[3] Le Thi Hoai An,et al. Exact penalty and error bounds in DC programming , 2012, J. Glob. Optim..

[4] Hoai An Le Thi,et al. DC programming and DCA: thirty years of developments , 2018, Mathematical Programming.

[5] K. Marton. Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration , 1996 .

[6] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[7] Kim-Chuan Toh,et al. An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming , 2015, Mathematical Programming.

[8] Mikhail Posypkin,et al. Optimization and Applications , 2018, Communications in Computer and Information Science.

[9] Lin F. Yang,et al. Online Factorization and Partition of Complex Networks From Random Walks , 2017 .

[10] Defeng Sun,et al. A Majorized Penalty Approach for Calibrating Rank Constrained Correlation Matrix Problems , 2010 .

[11] C. Lemaréchal,et al. Optimization and Applications , 2005 .

[12] Kim-Chuan Toh,et al. A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions , 2014, Mathematical Programming.

[13] Yanjun Han,et al. Minimax Estimation of Discrete Distributions Under $\ell _{1}$ Loss , 2014, IEEE Transactions on Information Theory.

[14] R. Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[15] Martin J. Wainwright,et al. Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[16] T. P. Dinh,et al. Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[17] Kim-Chuan Toh,et al. A Majorized ADMM with Indefinite Proximal Terms for Linearly Constrained Convex Composite Optimization , 2014, SIAM J. Optim..

[18] Rebecca Willett,et al. Inference of High-dimensional Autoregressive Generalized Linear Models , 2016, ArXiv.

[19] Yanjun Han,et al. Minimax Estimation of Discrete Distributions under ℓ1 Loss , 2014, ArXiv.

[20] Le Thi Hoai An,et al. The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[21] Amit Singer,et al. Semidefinite programming approach for the quadratic assignment problem with a sparse graph , 2017, Computational Optimization and Applications.

[22] Andrea Montanari,et al. Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[23] Xin Jiang,et al. Minimax Optimal Rates for Poisson Inverse Problems With Physical Constraints , 2014, IEEE Transactions on Information Theory.

[24] Qiang Sun,et al. Bernstein's inequality for general Markov chains , 2018, 1805.10721.

[25] Kun Deng,et al. Model reduction of Markov chains via low-rank approximation , 2012, 2012 American Control Conference (ACC).

[26] Le Thi Hoai An,et al. Stochastic DCA for the Large-sum of Non-convex Functions Problem and its Application to Group Variable Selection in Classification , 2017, ICML.

[27] M. Talagrand,et al. Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[28] Anru Zhang,et al. Spectral State Compression of Markov Processes , 2018, IEEE Transactions on Information Theory.

[29] Kamyar Azizzadenesheli,et al. Reinforcement Learning in Rich-Observation MDPs using Spectral Methods , 2016, 1611.03907.

[30] J. A. Fill. Eigenvalue bounds on convergence to stationarity for nonreversible markov chains , 1991 .

[31] E. L. Lehmann,et al. Theory of point estimation , 1950 .

[32] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34] Le Thi Hoai An,et al. DC programming and DCA: thirty years of developments , 2018, Math. Program..

[35] D. Vere-Jones. Markov Chains , 1972, Nature.

[36] H. Zou,et al. Another look at distance‐weighted discrimination , 2018 .

[37] Bo Wen,et al. A proximal difference-of-convex algorithm with extrapolation , 2016, Computational Optimization and Applications.

[38] Liguo Jiao,et al. Convergence Analysis of Algorithms for DC Programming , 2015, 1508.03899.

[39] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[40] Jianqing Fan,et al. I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. , 2015, Annals of statistics.

[41] Mark E. J. Newman,et al. Spectral methods for network community detection and graph partitioning , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42] Qingqing Huang,et al. Recovering Structured Probability Matrices , 2016, ITCS.

[43] K. Dill,et al. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. , 2007, The Journal of chemical physics.

[44] Martin J. Wainwright,et al. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[45] M. Maggioni,et al. Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[46] Yang Cao,et al. Poisson Matrix Recovery and Completion , 2015, IEEE Transactions on Signal Processing.

[47] Ronald R. Coifman,et al. Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems , 2008, Multiscale Model. Simul..

[48] Sean P. Meyn,et al. Optimal Kullback-Leibler Aggregation via Spectral Theory of Markov Chains , 2011, IEEE Transactions on Automatic Control.

[49] P. Buchholz. Exact and ordinary lumpability in finite Markov chains , 1994, Journal of Applied Probability.

[50] Kim-Chuan Toh,et al. A partial proximal point algorithm for nuclear norm regularized matrix least squares problems , 2014, Math. Program. Comput..

[51] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[52] Devavrat Shah,et al. Rank Centrality: Ranking from Pairwise Comparisons , 2012, Oper. Res..

[53] Tiejun Li,et al. Optimal partition and effective dynamics of complex networks , 2008, Proceedings of the National Academy of Sciences.

[54] G. Alistair Watson,et al. On matrix approximation problems with Ky Fank norms , 1993, Numerical Algorithms.

[55] J. Tropp. FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.

[56] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[57] H. Steinhaus. The Problem of Estimation , 1957 .

[58] Yuan Tian,et al. Understanding intra-urban trip patterns from taxi trajectory data , 2012, Journal of Geographical Systems.

[59] Kim-Chuan Toh,et al. Fast Algorithms for Large-Scale Generalized Distance Weighted Discrimination , 2016, 1604.05473.