Abstract In this paper, we continue our investigation of the warpeddiscrete cosine transform cepstrum (WDCTC), which wasearlier introduced as a new speech processing feature [1].Here, we study the statistical properties of the WDCTC andcompare them with the mel-frequency cepstral coefficients(MFCC). We report some interesting properties of the WD-CTC when compared to the MFCC: its statistical distrib-ution is more Gaussian-like with lower variance, it obtainsbettervowel clusterseparability,it formstightervowelclus-ters and generates better codebooks. Further, we employthe WDCTC and MFCC features in a 5-vowel recognitiontask using Vector Quantization (VQ) and 1-Nearest Neigh-bour(1-NN)as classifiers. Inourexperiments,the WDCTCconsistently outperforms the MFCC. 1. Introduction We recently introduced the warped discrete cosine trans-form cepstrum (WDCTC) as a new speech processing fea-ture and demonstrated its better performance than the mel-frequency cepstral coefficients (MFCC) in a vowel recog-nition and speaker-identification task [1]. The WDCTC hasshown good promise as a speech processing feature and weare encouraged to further investigate the WDCTC featureand its statistical properties.Alargevolumeoftrainingdataisrequiredto buildspea-ker-independentspeechrecognitionsystems. Onetechniqueof reducing the data size is clustering the data and choos-ing a reasonable number of representative feature vectorsto form codebooks [2]. Hence, codebook techniques arevery relevant and practical to speech recognition systems.We form WDCTC and MFCC codebooks using a k-meansclustering algorithm and compare the codebook statisticsfor clean and noisy vowels using the coefficient of varianceand overlap ratio (defined later). Our experiment demon-strates that the WDCTC codebooks represent the underly-ing vowel data better than MFCC.In order to compare the classification capability of thefeatures, the WDCTC and MFCC are employed in a 5-vowel recognition task. Vector quantization (VQ) and 1-nearestneighbor(1-NN,[2])are usedas classifiersandtheirrecognition performance is reported. We also investigatethe clean and noisy vowel clusters formed by WDCTC andMFCC features and present the average separability of thevowel classes.
[1]
Stephen A. Martucci,et al.
Symmetric convolution and the discrete sine and cosine transforms
,
1993,
IEEE Trans. Signal Process..
[2]
Vishwa Gupta,et al.
Decision rules for speaker-independent isolated word recognition
,
1984,
ICASSP.
[3]
Rangarao Muralishankar,et al.
Pseudo Complex Cepstrum Using Discrete Cosine Transform
,
2005,
Int. J. Speech Technol..
[4]
Sanjit K. Mitra,et al.
Warped discrete cosine transform and its application in image compression
,
2000,
IEEE Trans. Circuits Syst. Video Technol..
[5]
Detlev Langmann,et al.
A comparative study of linear feature transformation techniques for automatic speech recognition
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[6]
Julius O. Smith,et al.
Bark and ERB bilinear transforms
,
1999,
IEEE Trans. Speech Audio Process..
[7]
Douglas D. O'Shaughnessy,et al.
Warped discrete cosine transform cepstrum: A new feature for speech processing
,
2005,
2005 13th European Signal Processing Conference.