论文信息 - X-mHMM: an efficient algorithm for training mixtures of HMMs when the number of mixtures is unknown

X-mHMM: an efficient algorithm for training mixtures of HMMs when the number of mixtures is unknown

In this paper we consider sequence clustering problems and propose an algorithm for the estimation of the number of clusters based on the X-means algorithm. The sequences are modeled using mixtures of Hidden Markov Models. By means of experiments with synthetic data we analyze the proposed algorithm. This algorithm proved to be both computationally efficient and capable of providing accurate estimates of the number of clusters. Some results of experiments with real-world Web-log data are also given.

Csaba Szepesvári | Zoltán Szamonek | Csaba Szepesvari | Zoltán Szamonek

[1] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[2] Joydeep Ghosh,et al. A Unified Framework for Model-based Clustering , 2003, J. Mach. Learn. Res..

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[6] Padhraic Smyth,et al. Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[7] Mário A. T. Figueiredo,et al. Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[8] Gautam Biswas,et al. A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models , 2000, ICML.

[9] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[10] Biing-Hwang Juang,et al. The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..