Efficient inference of hidden Markov models from large observation sequences

The hidden Markov model (HMM) is widely used to model time series data. However, the conventional Baum- Welch algorithm is known to perform poorly when applied to long observation sequences. The literature contains several alternatives that seek to improve the memory or time complexity of the algorithm. However, for an HMM with N states and an observation sequence of length T, these alternatives require at best O(N) space and O(N2T) time. Given the preponderance of applications that increasingly deal with massive amounts of data, an alternative whose time is O(T)+poly(N) is desired. Recent research presents an alternative to the Baum-Welch algorithm that relies on nonnegative matrix factorization. This document examines the space complexity of this alternative approach and proposes further optimizations using approaches adopted from the matrix sketching literature. The result is a streaming algorithm whose space complexity is constant and time complexity is linear with respect to the size of the observation sequence. The paper also presents a batch algorithm that allow for even further improved space complexity at the expense of an additional pass over the observation sequence.

[1]  Reid G. Simmons,et al.  Unsupervised learning of probabilistic models for robot navigation , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[4]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[5]  Brian D. O. Anderson,et al.  The Realization Problem for Hidden Markov Models , 1999, Math. Control. Signals Syst..

[6]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[7]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[8]  Robert Sabourin,et al.  On the memory complexity of the forward-backward algorithm , 2010, Pattern Recognit. Lett..

[9]  Angela Grassi,et al.  Approximation of stationary processes by hidden Markov models , 2010, Math. Control. Signals Syst..

[10]  Pierre Baldi,et al.  Smooth On-Line Learning Algorithms for Hidden Markov Models , 1994, Neural Computation.

[11]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[12]  L. Finesso,et al.  Nonnegative matrix factorization and I-divergence alternating minimization☆ , 2004, math/0412070.

[13]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[14]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[15]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[16]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[17]  Joel E. Cohen,et al.  Nonnegative ranks, decompositions, and factorizations of nonnegative matrices , 1993 .

[18]  Irmtraud M. Meyer,et al.  Gene structure conservation aids similarity based gene prediction. , 2004, Nucleic acids research.

[19]  Zhigang Luo,et al.  Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[20]  K. Marton,et al.  Entropy and the Consistent Estimation of Joint Distributions , 1993, Proceedings. IEEE International Symposium on Information Theory.

[21]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[22]  George Cybenko,et al.  Learning Hidden Markov Models using , 2008 .

[23]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.