A Minimax Classification Approach With Application To Robust Speech Recognition

A minimax approach for robust classification of parametric information sources is studied and applied to isolated-word speech recognition based on hidden Markov modeling. The goal is to reduce the sensitivity of speech recognition systems to a possible mismatch between the training and testing conditions. To this end, a generalized likelihood ratio test is developed and shown to be optimal in the sense of achieving the highest asymptotic exponential rate of decay of the error probability for the worst-case mismatch situation. The proposed approach is compared to the standard approach, where no mismatch is assumed, in recognition of noisy speech and in other realistic mismatch situations. >

[1]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[2]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[3]  Oded Ghitza,et al.  Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .

[4]  N. Merhav,et al.  Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[5]  D. Van Compernolle Increased noise immunity in large vocabulary speech recognition with the aid of spectral subtraction , 1987, ICASSP.

[6]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[7]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[8]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Biing-Hwang Juang,et al.  Recent developments in speech recognition under adverse conditions , 1990, ICSLP.

[10]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[11]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[12]  Chin-Hui Lee,et al.  On the asymptotic statistical behavior of empirical cepstral coefficients , 1993, IEEE Trans. Signal Process..

[13]  Neri Merhav,et al.  A Bayesian classification approach with application to speech recognition , 1991, IEEE Trans. Signal Process..

[14]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to automatic speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Yeunung Chen,et al.  Cepstral domain stress compensation for robust speech recogniton , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[18]  Yariv Ephraim Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[19]  A. Erell,et al.  Estimation using log-spectral-distance criterion for noise-robust speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[20]  M. Hunt,et al.  Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[21]  W. Hoeffding Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[22]  P. J. Huber Robust Statistical Procedures , 1977 .

[23]  Lawrence R. Rabiner,et al.  Some performance benchmarks for isolated work speech recognition systems , 1987 .

[24]  A. Nadas,et al.  Adaptive labeling: normalization of speech by adaptive transformations based on vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[25]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[26]  Lawrence R. Rabiner,et al.  A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.

[27]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[28]  William J. Byrne,et al.  The Auditory Processing and Recognition of Speech , 1989, HLT.

[29]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  J. Makhoul,et al.  On the statistics of the estimated reflection coefficients of an autoregressive process , 1983 .

[31]  D. B. Roe Speech recognition with a noise-adapting codebook , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  H. Matsumoto,et al.  Comparative study of various spectrum matching measures on noise robustness , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  D. Mansour,et al.  The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[36]  John H. L. Hansen,et al.  Stress compensation and noise reduction algorithms for robust speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[37]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[38]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[39]  D. A. Preece,et al.  An introduction to the statistical analysis of data , 1979 .

[40]  Biing-Hwang Juang,et al.  Signal restoration by spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Chin-Hui Lee,et al.  Speech recognition under additive noise , 1984, ICASSP.

[42]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[43]  Steven F. Boll,et al.  Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.

[44]  IEEE Transactions on Speech and Audio Processing , 2022 .

[45]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inverse problems , 1990 .

[46]  Man Mohan Sondhi,et al.  A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Oded Ghitza Robustness against noise: The role of timing-synchrony measurement , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48]  Byoung-Seon Choi,et al.  Conditional limit theorems under Markov conditioning , 1987, IEEE Trans. Inf. Theory.

[49]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[50]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[51]  Stefan Dobler,et al.  Real-time connected-word recognition in a noisy environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[52]  D. van Compernolle Spectral estimation using a log-distance error criterion applied to speech recognition , 1989, ICASSP.

[53]  R. Ellis,et al.  Large deviations and statistical mechanics , 1985 .

[54]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[55]  D. B. Paul A speaker-stress resistant HMM isolated word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  Roger K. Moore,et al.  Noise compensation algorithms for use with hidden Markov model based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[58]  H. Gish,et al.  Probabilistic vector mapping of noisy speech parameters for HMM word spotting , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[59]  S. Natarajan Large deviations, hypotheses testing, and source coding for finite Markov chains , 1985, IEEE Trans. Inf. Theory.

[60]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[61]  R. Ellis,et al.  Entropy, large deviations, and statistical mechanics , 1985 .

[62]  A. Kester,et al.  Large Deviations of Estimators , 1986 .

[63]  Neri Merhav,et al.  The estimation of the model order in exponential families , 1989, IEEE Trans. Inf. Theory.

[64]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[65]  N. Sedgwick,et al.  Noise compensation for speech recognition using probabilistic models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[67]  R. R. Bahadur Rates of Convergence of Estimates and Test Statistics , 1967 .

[68]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[69]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[70]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[71]  Brian A. Hanson,et al.  Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[72]  Yariv Ephraim,et al.  A linear predictive front-end processor for speech recognition in noisy environments , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73]  S.A. Kassam,et al.  Robust techniques for signal processing: A survey , 1985, Proceedings of the IEEE.

[74]  P. J. Huber A Robust Version of the Probability Ratio Test , 1965 .

[75]  Clifford J. Weinstein,et al.  Experiments in isolated word recognition using noisy speech , 1983, ICASSP.