X-mHMM: an efficient algorithm for training mixtures of HMMs when the number of mixtures is unknown

In this paper we consider sequence clustering problems and propose an algorithm for the estimation of the number of clusters based on the X-means algorithm. The sequences are modeled using mixtures of Hidden Markov Models. By means of experiments with synthetic data we analyze the proposed algorithm. This algorithm proved to be both computationally efficient and capable of providing accurate estimates of the number of clusters. Some results of experiments with real-world Web-log data are also given.