Hidden Markov modeling of speech using Toeplitz covariance matrices

Abstract Hidden Markov modeling of speech waveforms using structured covariance matrices is studied and applied to recognition of clean and noisy speech signals. This technique allows for easier model adaptation in additive noise than does cepstral modeling of speech. Waveform modeling using autoregressive (AR) structured covariances has been extensively studied and applied previously. However, other covariance structures are possible and here we consider waveform modeling using Toeplitz and circulant structured covariances. We detail maximum likelihood (ML) hidden Markov model training and recognition routines using these matrices, and ML speech gain estimation routines. We show equivalence of asymptotic probabilities of recognition error, under certain conditions, using Toeplitz and circulant matrices to using AR matrices. In experimental results on isolated digits in clean conditions, the Toeplitz covariance structure provides higher performance than the AR structure and has performance similar to that reported in the literature of a cepstral system on the same database. In additive Gaussian noise, we demonstrate superior performance to both the cepstral system and the AR system.

[1]  Amir Dembo,et al.  The relation between maximum likelihood estimation of structured covariance matrices and periodograms , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2]  Colin L. Mallows,et al.  Embedding nonnegative definite Toeplitz matrices in nonnegative definite circulant matrices, with application to covariance estimation , 1989, IEEE Trans. Inf. Theory.

[3]  Mazin G. Rahim,et al.  On second-order statistics and linear estimation of cepstral coefficients , 1999, IEEE Trans. Speech Audio Process..

[4]  T. Martin,et al.  On the effects of varying filter bank parameters on isolated word recognition , 1982 .

[5]  Thomas L. Marzetta,et al.  Detection, Estimation, and Modulation Theory , 1976 .

[6]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[7]  William J. J. Roberts,et al.  Robust automatic speech recognition , 1997 .

[8]  D. Luenberger,et al.  Estimation of structured covariance matrices , 1982, Proceedings of the IEEE.

[9]  Steven Kay,et al.  Modern Spectral Estimation: Theory and Application , 1988 .

[10]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[12]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Michael I. Miller,et al.  On the existence of positive-definite maximum-likelihood estimates of structured covariance matrices , 1988, IEEE Trans. Inf. Theory.

[14]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[15]  Neri Merhav,et al.  Maximum likelihood hidden Markov modeling using a dominant sequence of states , 1991, IEEE Trans. Signal Process..

[16]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[17]  C. R. Dietrich,et al.  Bounds on the size of nonnegative definite circulant embeddings of positive definite Toeplitz matrices , 1994, IEEE Trans. Inf. Theory.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  M.I. Miller,et al.  The role of likelihood and entropy in incomplete-data problems: Applications to estimating point-process intensities and toeplitz constrained covariances , 1987, Proceedings of the IEEE.

[20]  M. Morf,et al.  Inverses of Toeplitz operators, innovations, and orthogonal polynomials , 1975, 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes.

[21]  Ted H. Szatrowski,et al.  Necessary and Sufficient Conditions for Explicit Solutions in the Multivariate Normal Estimation Problem for Patterned Means and Covariances , 1980 .

[22]  Y. Ephraim Statistical model-based speech enhancement systems , 1988 .

[23]  N. Merhav,et al.  Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[24]  A. Nadas,et al.  Speech recognition using noise-adaptive prototypes , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[25]  Ingram Olkin,et al.  Testing and Estimation for a Circular Stationary Model , 1969 .

[26]  Marwan A. Simaan,et al.  Array filters for attenuating coherence interference in the presence of random noise , 1986, IEEE Trans. Acoust. Speech Signal Process..

[27]  Yariv Ephraim Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[28]  Chin-Hui Lee,et al.  A minimax classification approach with application to robust speech recognition , 1993, IEEE Trans. Speech Audio Process..

[29]  L. Scharf,et al.  Statistical Signal Processing: Detection, Estimation, and Time Series Analysis , 1991 .

[30]  Donald B. Rubin,et al.  Finding maximum likelihood estimates of patterned covariance matrices by the EM algorithm , 1982 .

[31]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[32]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[33]  Neri Merhav,et al.  A Bayesian classification approach with application to speech recognition , 1991, IEEE Trans. Signal Process..