论文信息 - Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition

Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition

This paper describes an accurate feature representation for continuous clean speech recognition. The main components of the technique involve performing a moderate order Linear Predictive (LP) analysis and computing the Minimum Variance Distortionless Response (MVDR) spectrum from these LP coefficients. This feature representation, PMCCs, was earlier shown to yield superior performance over MFCCs for different noise conditions with emphasis on car noise [1]. The performance improvement was then attributed to better spectrum and envelope modeling properties of the MVDR methodology. This study shows that the representation is also quite efficient for clean speech recognition. In fact, PMCCs are shown to be a more accurate envelope representation and reduce speaker variability. This, in turn, yields a 12.8% relative word error rate (WER) reduction on the coombination of Wall Street Journal (WSJ) Nov’92 dev/eval sets with respect to the MFCCs. Accurate envelope modeling and reduction in the speaker variability also lead to faster decoding, based on efficient pruning in the search stage. The total gain in the decoding speed is 22.4%, relative to the standard MFCC features. It is also shown that PMCCs are not very demanding in terms of computation when compared to MFCCs. Therefore, we conclude that PMCC feature extraction scheme is a better representation of clean speech as well as noisy speech than MFCC scheme.

[1] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .

[2] G. Lothian,et al. Spectral Analysis , 1971, Nature.

[3] B. Porat,et al. Digital Spectral Analysis with Applications. , 1988 .

[4] Bhaskar D. Rao,et al. MVDR based feature extraction for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5] Steve Rogers,et al. Adaptive Filter Theory , 1996 .

[6] Liang Gu,et al. Perceptual harmonic cepstral coefficients as the front-end for speech recognition , 2000, INTERSPEECH.

[7] Reinhold Häb-Umbach. Investigations on inter-speaker variability in the feature space , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8] Ali H. Sayed,et al. A survey of spectral factorization methods , 2001, Numer. Linear Algebra Appl..

[9] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[11] Jean-Pierre Adoul,et al. Frequency-domain spectral envelope estimation for low rate coding of speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12] John H. L. Hansen,et al. A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[13] W. M. Carey,et al. Digital spectral analysis: with applications , 1986 .

[14] Amro El-Jaroudi,et al. Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[15] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[16] Bhaskar D. Rao,et al. All-pole modeling of speech based on the minimum variance distortionless response spectrum , 2000, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[17] Satya Dharanipragada,et al. Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18] Hermann Ney,et al. The RWTH large vocabulary continuous speech recognition system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).