Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms

This paper investigates the problem of finding a K-state first-order Markov chain that approximates an M -state first-order Markov chain, where K is typically much smaller than M . A variety of greedy heuristic search algorithms that maximize the data likelihood are investigated and found to work well empirically. The proposed algorithms are demonstrated on two applications: learning user models from traces of Unix commands, and word segmentation in language modeling.