Modified Mel Filter Bank to Compute MFCC of Subsampled Speech

Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We also propose a stronger metric which effectively captures the correlation between MFCCs of original speech and MFCC of resampled speech. It is found that the proposed method of filter bank construction performs distinguishably well and gives recognition performance on resampled speech close to recognition accuracies on original speech.

[1]  M. Sayadi,et al.  Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[2]  Qingyang Hong,et al.  Using Mel-Frequency Cepstral Coefficients in Missing Data Technique , 2004, EURASIP J. Adv. Signal Process..

[3]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[4]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[6]  Barry Arons,et al.  Techniques, Perception, and Applications of Time-Compressed Speech , 2009 .

[7]  Kaare Brandt Petersen,et al.  Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[8]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[9]  Sunil Kumar Kopparapu,et al.  Choice of Mel filter bank in computing MFCC of a resampled speech , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Donald Joseph Hejna,et al.  Real-time time-scale modification of speech via the synchronized overlap-add algorithm , 1990 .