Clustering sequence data using hidden Markov model representation

This paper proposes a clustering methodology, for sequence data, using hidden Markov model (HMM) representation. The proposed methodology improves upon existing HMM-based clustering methods in two ways: (i) it enables HMMs to dynamically change its model structure, to obtain a better fit model for data during the clustering process, and (ii) it provides objective criterion function, to select the optimal clustering partition. The algorithm is presented in terms of four nested levels of searches: (i) the search for the optimal number of clusters in a partition, (ii) the search for the optimal structure for a given partition, (iii) the search for the optimal HMM structure for each cluster, and (iv) the search for the optimal HMM parameters for each HMM. Preliminary results are given to support the proposed methodology.

[1]  Jerry B. Weinberg,et al.  ITERATE: A Conceptual Clustering Method for Knowledge Discovery in Databases , 1994 .

[2]  Biing-Hwang Juang,et al.  HMM clustering for connected word recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[4]  W. Andrew LO, . Finance: Survey.. Journal of the American Statistical Association, , . , 2000 .

[5]  George K. Kokkinakis,et al.  Algorithm for clustering continuous density HMM by recognition error , 1996, IEEE Trans. Speech Audio Process..

[6]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[7]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[8]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Stephen M. Omohundro,et al.  Best-First Model Merging for Dynamic Learning and Recognition , 1991, NIPS.

[10]  Harvey F. Silverman,et al.  Computations and evaluations of an optimal feature-set for an hmm-based recognizer , 1996 .

[11]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Francisco Casacuberta,et al.  Learning the structure of HMM's through grammatical inference techniques , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[15]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[16]  David L. Dowe,et al.  Intrinsic classification by MML - the Snob program , 1994 .

[17]  Tetsuo Kosaka,et al.  Speaker-independent phone modeling based on speaker-dependent HMMs' composition and clustering , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[21]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[22]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[23]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[24]  Kay-Fu Lee,et al.  Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..