A minimax classification approach with application to robust speech recognition

A minimax approach for robust classification of parametric information sources is studied and applied to isolated-word speech recognition based on hidden Markov modeling. The goal is to reduce the sensitivity of speech recognition systems to a possible mismatch between the training and testing conditions. To this end, a generalized likelihood ratio test is developed and shown to be optimal in the sense of achieving the highest asymptotic exponential rate of decay of the error probability for the worst-case mismatch situation. The proposed approach is compared to the standard approach, where no mismatch is assumed, in recognition of noisy speech and in other realistic mismatch situations. >

[1]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[2]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  Clifford J. Weinstein,et al.  Experiments in isolated word recognition using noisy speech , 1983, ICASSP.

[5]  A. Kester,et al.  Large Deviations of Estimators , 1986 .

[6]  Biing-Hwang Juang,et al.  Recent developments in speech recognition under adverse conditions , 1990, ICSLP.

[7]  A. Erell,et al.  Estimation using log-spectral-distance criterion for noise-robust speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  A. Nadas,et al.  Adaptive labeling: normalization of speech by adaptive transformations based on vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[10]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  J. Makhoul,et al.  On the statistics of the estimated reflection coefficients of an autoregressive process , 1983 .

[12]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[13]  S. Natarajan,et al.  Large deviations, hypotheses testing, and source coding for finite Markov chains , 1985, IEEE Trans. Inf. Theory.

[14]  H. Gish,et al.  Probabilistic vector mapping of noisy speech parameters for HMM word spotting , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  William J. Byrne,et al.  The Auditory Processing and Recognition of Speech , 1989, HLT.

[18]  Robert M. Gray,et al.  Toeplitz And Circulant Matrices , 1977 .

[19]  Steven F. Boll,et al.  Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.

[20]  D. Van Compernolle Increased noise immunity in large vocabulary speech recognition with the aid of spectral subtraction , 1987, ICASSP.

[21]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[22]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[23]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[24]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[25]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[26]  R. Ellis,et al.  Entropy, large deviations, and statistical mechanics , 1985 .

[27]  R. R. Bahadur Rates of Convergence of Estimates and Test Statistics , 1967 .

[28]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[29]  Neri Merhav,et al.  A Bayesian classification approach with application to speech recognition , 1991, IEEE Trans. Signal Process..

[30]  P. J. Huber Robust Statistical Procedures , 1977 .

[31]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[32]  Frank K. Soong,et al.  A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise , 1988, IEEE Trans. Acoust. Speech Signal Process..

[33]  Oded Ghitza,et al.  Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .

[34]  R. Ellis,et al.  Large deviations and statistical mechanics , 1985 .

[35]  Biing-Hwang Juang,et al.  Signal restoration by spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[37]  Oded Ghitza Robustness against noise: The role of timing-synchrony measurement , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Yariv Ephraim Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[39]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Roger K. Moore,et al.  Noise compensation algorithms for use with hidden Markov model based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[41]  M. Hunt,et al.  Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[42]  P. J. Huber A Robust Version of the Probability Ratio Test , 1965 .

[43]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[44]  D. van Compernolle Spectral estimation using a log-distance error criterion applied to speech recognition , 1989, ICASSP.

[45]  Chin-Hui Lee,et al.  Speech recognition under additive noise , 1984, ICASSP.

[46]  D. B. Paul A speaker-stress resistant HMM isolated word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  N. Merhav,et al.  Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[48]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[49]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[50]  Neri Merhav,et al.  The estimation of the model order in exponential families , 1989, IEEE Trans. Inf. Theory.

[51]  Lawrence R. Rabiner,et al.  A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.

[52]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[53]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[54]  Yariv Ephraim,et al.  A linear predictive front-end processor for speech recognition in noisy environments , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Byoung-Seon Choi,et al.  Conditional limit theorems under Markov conditioning , 1987, IEEE Trans. Inf. Theory.

[56]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to automatic speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[57]  W. Hoeffding Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[58]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inverse problems , 1990 .

[59]  Stefan Dobler,et al.  Real-time connected-word recognition in a noisy environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[60]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[61]  Yeunung Chen,et al.  Cepstral domain stress compensation for robust speech recogniton , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  S.A. Kassam,et al.  Robust techniques for signal processing: A survey , 1985, Proceedings of the IEEE.

[63]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[64]  D. B. Roe Speech recognition with a noise-adapting codebook , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65]  H. Matsumoto,et al.  Comparative study of various spectrum matching measures on noise robustness , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66]  Chin-Hui Lee,et al.  On the asymptotic statistical behavior of empirical cepstral coefficients , 1993, IEEE Trans. Signal Process..

[67]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[68]  John H. L. Hansen,et al.  Stress compensation and noise reduction algorithms for robust speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[69]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[70]  Lawrence R. Rabiner,et al.  Some performance benchmarks for isolated work speech recognition systems , 1987 .

[71]  N. Sedgwick,et al.  Noise compensation for speech recognition using probabilistic models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[73]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[74]  D. Mansour,et al.  The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[75]  Brian A. Hanson,et al.  Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[76]  D. A. Preece,et al.  An introduction to the statistical analysis of data , 1979 .