On the application of mixture AR hidden Markov models to text independent speaker recognition

Linear predictive hidden Markov models have proved to be efficient for statistically modeling speech signals. The possible application of such models to statistical characterization of the speaker himself is described and evaluated. The results show that even with a short sequence of only four isolated digits, a speaker can be verified with an average equal-error rate of less than 3 %. These results are slightly better than the results obtained using speaker-dependent vector quantizers, with comparable numbers of spectral vectors. The small improvement over the vector quantization approach indicates the weakness of the Markovian transition probabilities for characterizing speaker-dependent transitional information. >

[1]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for speech analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[4]  Aaron E. Rosenberg,et al.  Report: A vector quantization approach to speaker recognition , 1987, AT&T Technical Journal.

[5]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[6]  Amir Dembo,et al.  A minimum discrimination information approach for hidden Markov modeling , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[8]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[11]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[15]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[16]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.