论文信息 - A minimax classification approach with application to robust speech recognition

A minimax classification approach with application to robust speech recognition

A minimax approach for robust classification of parametric information sources is studied and applied to isolated-word speech recognition based on hidden Markov modeling. The goal is to reduce the sensitivity of speech recognition systems to a possible mismatch between the training and testing conditions. To this end, a generalized likelihood ratio test is developed and shown to be optimal in the sense of achieving the highest asymptotic exponential rate of decay of the error probability for the worst-case mismatch situation. The proposed approach is compared to the standard approach, where no mismatch is assumed, in recognition of noisy speech and in other realistic mismatch situations. >

Chin-Hui Lee | Neri Merhav

[1] Donald B. Rubin,et al. Max-imum Likelihood from Incomplete Data , 1972 .

[2] B.S. Atal,et al. Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[3] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4] Clifford J. Weinstein,et al. Experiments in isolated word recognition using noisy speech , 1983, ICASSP.

[5] A. Kester,et al. Large Deviations of Estimators , 1986 .

[6] Biing-Hwang Juang,et al. Recent developments in speech recognition under adverse conditions , 1990, ICSLP.

[7] A. Erell,et al. Estimation using log-spectral-distance criterion for noise-robust speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8] A. Nadas,et al. Adaptive labeling: normalization of speech by adaptive transformations based on vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9] Harry L. Van Trees,et al. Detection, Estimation, and Modulation Theory, Part I , 1968 .

[10] Biing-Hwang Juang,et al. The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11] J. Makhoul,et al. On the statistics of the estimated reflection coefficients of an autoregressive process , 1983 .

[12] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[13] S. Natarajan. Large deviations, hypotheses testing, and source coding for finite Markov chains , 1985, IEEE Trans. Inf. Theory.

[14] H. Gish,et al. Probabilistic vector mapping of noisy speech parameters for HMM word spotting , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15] Jacob Ziv,et al. On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[16] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17] William J. Byrne,et al. The Auditory Processing and Recognition of Speech , 1989, HLT.

[18] Robert M. Gray,et al. Toeplitz And Circulant Matrices , 1977 .

[19] Steven F. Boll,et al. Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.

[20] D. Van Compernolle. Increased noise immunity in large vocabulary speech recognition with the aid of spectral subtraction , 1987, ICASSP.

[21] L. R. Rabiner,et al. Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[22] Rodney W. Johnson,et al. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[23] Brian Hanson,et al. Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[24] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[25] P. J. Huber. Robust Estimation of a Location Parameter , 1964 .

[26] R. Ellis,et al. Entropy, large deviations, and statistical mechanics , 1985 .

[27] R. R. Bahadur. Rates of Convergence of Estimates and Test Statistics , 1967 .

[28] Solomon Kullback,et al. Information Theory and Statistics , 1970, The Mathematical Gazette.

[29] Neri Merhav,et al. A Bayesian classification approach with application to speech recognition , 1991, IEEE Trans. Signal Process..

[30] P. J. Huber. Robust Statistical Procedures , 1977 .

[31] Yeunung Chen,et al. Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[32] Frank K. Soong,et al. A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise , 1988, IEEE Trans. Acoust. Speech Signal Process..

[33] Oded Ghitza,et al. Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .

[34] R. Ellis,et al. Large deviations and statistical mechanics , 1985 .

[35] Biing-Hwang Juang,et al. Signal restoration by spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36] Biing-Hwang Juang,et al. On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[37] Oded Ghitza. Robustness against noise: The role of timing-synchrony measurement , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38] Yariv Ephraim. Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[39] E. A. Martin,et al. Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40] Roger K. Moore,et al. Noise compensation algorithms for use with hidden Markov model based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[41] M. Hunt,et al. Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[42] P. J. Huber. A Robust Version of the Probability Ratio Test , 1965 .

[43] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[44] D. van Compernolle. Spectral estimation using a log-distance error criterion applied to speech recognition , 1989, ICASSP.

[45] Chin-Hui Lee,et al. Speech recognition under additive noise , 1984, ICASSP.

[46] D. B. Paul. A speaker-stress resistant HMM isolated word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47] N. Merhav,et al. Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[48] Biing-Hwang Juang,et al. On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[49] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[50] Neri Merhav,et al. The estimation of the model order in exponential families , 1989, IEEE Trans. Inf. Theory.

[51] Lawrence R. Rabiner,et al. A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.

[52] B. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[53] Michael Gutman,et al. Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[54] Yariv Ephraim,et al. A linear predictive front-end processor for speech recognition in noisy environments , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55] Byoung-Seon Choi,et al. Conditional limit theorems under Markov conditioning , 1987, IEEE Trans. Inf. Theory.

[56] John H. L. Hansen,et al. Constrained iterative speech enhancement with application to automatic speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[57] W. Hoeffding. Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[58] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inverse problems , 1990 .

[59] Stefan Dobler,et al. Real-time connected-word recognition in a noisy environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[60] Neri Merhav,et al. On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[61] Yeunung Chen,et al. Cepstral domain stress compensation for robust speech recogniton , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62] S.A. Kassam,et al. Robust techniques for signal processing: A survey , 1985, Proceedings of the IEEE.

[63] Richard M. Stern,et al. Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[64] D. B. Roe. Speech recognition with a noise-adapting codebook , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65] H. Matsumoto,et al. Comparative study of various spectrum matching measures on noise robustness , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66] Chin-Hui Lee,et al. On the asymptotic statistical behavior of empirical cepstral coefficients , 1993, IEEE Trans. Signal Process..

[67] Biing-Hwang Juang,et al. The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[68] John H. L. Hansen,et al. Stress compensation and noise reduction algorithms for robust speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[69] Biing-Hwang Juang,et al. A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[70] Lawrence R. Rabiner,et al. Some performance benchmarks for isolated work speech recognition systems , 1987 .

[71] N. Sedgwick,et al. Noise compensation for speech recognition using probabilistic models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72] Michael Picheny,et al. Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[73] Dirk Van Compernolle. Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[74] D. Mansour,et al. The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[75] Brian A. Hanson,et al. Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[76] D. A. Preece,et al. An introduction to the statistical analysis of data , 1979 .