Mixture autoregressive hidden Markov models for speech signals

In this paper a signal modeling technique based upon finite mixture autoregressive probabilistic functions of Markov chains is developed and applied to the problem of speech recognition, particularly speaker-independent recognition of isolated digits. Two types of mixture probability densities are investigated: finite mixtures of Gaussian autoregressive densities (GAM) and nearest-neighbor partitioned finite mixtures of Gaussian autoregressive densities (PGAM). In the former (GAM), the observation density in each Markov state is simply a (stochastically constrained) weighted sum of Gaussian autoregressive densities, while in the latter (PGAM) it involves nearest-neighbor decoding which in effect, defines a set of partitions on the observation space. In this paper we discuss the signal modeling methodology and give experimental results on speaker independent recognition of isolated digits. We also discuss the potential use of the modeling technique for other applications.

[1]  J. Cooper,et al.  Les Fonctions définies-positives et les Fonctions complètement monotones , 1951, The Mathematical Gazette.

[2]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  H. Sorenson,et al.  Recursive bayesian estimation using gaussian sums , 1971 .

[5]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[6]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[7]  Aaron E. Rosenberg,et al.  Speaker independent recognition of isolated words using clustering techniques , 1979, ICASSP.

[8]  Robert M. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[9]  L. Rabiner,et al.  A simplified, robust training procedure for speaker trained, isolated word recognition systems , 1980 .

[10]  L. Rabiner,et al.  Speaker‐independent isolated word recognition using a 129‐word airline vocabulary , 1981 .

[11]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[12]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[13]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[14]  A. Rosenberg,et al.  A speech data base facility using a computer‐controlled cassette tape deck , 1982 .

[15]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[16]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[17]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[18]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[19]  Robert M. Gray,et al.  An Algorithm for the Design of Labeled-Transition Finite-State Vector Quantizers , 1985, IEEE Trans. Commun..

[20]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[21]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.