Warped discrete cosine transform cepstrum: A new feature for speech processing

In this paper, we propose a new feature for speech recognition and speaker identification application. The new feature is termed as warped-discrete cosine transform cepstrum (WDCTC). The feature is obtained by replacing the discrete cosine transform (DCT) by the warped discrete cosine transform (WDCT, [4]) in the discrete cosine tranform cepstrum (DCTC [2]). The WDCT is implemented as a cascade of the DCT and IIR all-pass filters. We incorporate a nonlinear frequency-scale in DCTC which closely follows the bark-scale. This is accomplished by setting the all-pass filter parameter using an expression given by Smith and Abel [5]. Performance of WDCTC is compared to mel-frequency cepstral coefficients (MFCC) in a speech recognition and speaker identification experiment. WDCTC outperforms MFCC in both noisy and noiseless conditions.

[1]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Sanjit K. Mitra,et al.  Warped discrete cosine transform and its application in image compression , 2000, IEEE Trans. Circuits Syst. Video Technol..

[3]  Stephen A. Martucci,et al.  Symmetric convolution and the discrete sine and cosine transforms , 1993, IEEE Trans. Signal Process..

[4]  Vishwa Gupta,et al.  Decision rules for speaker-independent isolated word recognition , 1984, ICASSP.

[5]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[6]  Soo Ngee Koh,et al.  Noisy speech enhancement using discrete cosine transform , 1998, Speech Commun..

[7]  Rangarao Muralishankar,et al.  DCT based pseudo complex cepstrum , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Rangarao Muralishankar,et al.  Pseudo Complex Cepstrum Using Discrete Cosine Transform , 2005, Int. J. Speech Technol..

[9]  Julius O. Smith,et al.  Bark and ERB bilinear transforms , 1999, IEEE Trans. Speech Audio Process..

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .