论文信息 - Warped discrete cosine transform cepstrum: A new feature for speech processing

Warped discrete cosine transform cepstrum: A new feature for speech processing

In this paper, we propose a new feature for speech recognition and speaker identification application. The new feature is termed as warped-discrete cosine transform cepstrum (WDCTC). The feature is obtained by replacing the discrete cosine transform (DCT) by the warped discrete cosine transform (WDCT, [4]) in the discrete cosine tranform cepstrum (DCTC [2]). The WDCT is implemented as a cascade of the DCT and IIR all-pass filters. We incorporate a nonlinear frequency-scale in DCTC which closely follows the bark-scale. This is accomplished by setting the all-pass filter parameter using an expression given by Smith and Abel [5]. Performance of WDCTC is compared to mel-frequency cepstral coefficients (MFCC) in a speech recognition and speaker identification experiment. WDCTC outperforms MFCC in both noisy and noiseless conditions.

Douglas D. O'Shaughnessy | Rangarao Muralishankar | Abhijeet Sangwan

[1] Douglas A. Reynolds,et al. Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2] Sanjit K. Mitra,et al. Warped discrete cosine transform and its application in image compression , 2000, IEEE Trans. Circuits Syst. Video Technol..

[3] Stephen A. Martucci,et al. Symmetric convolution and the discrete sine and cosine transforms , 1993, IEEE Trans. Signal Process..

[4] Vishwa Gupta,et al. Decision rules for speaker-independent isolated word recognition , 1984, ICASSP.

[5] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[6] Soo Ngee Koh,et al. Noisy speech enhancement using discrete cosine transform , 1998, Speech Commun..

[7] Rangarao Muralishankar,et al. DCT based pseudo complex cepstrum , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Rangarao Muralishankar,et al. Pseudo Complex Cepstrum Using Discrete Cosine Transform , 2005, Int. J. Speech Technol..

[9] Julius O. Smith,et al. Bark and ERB bilinear transforms , 1999, IEEE Trans. Speech Audio Process..

[10] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .