Robust Analysis and Weighting on MFCC Components for Speech Recognition and Speaker Identification

Mismatch between training and testing data is a major error source for both automatic speech recognition (ASR) and automatic speaker identification (ASI). In this paper, we first present a statistical weighting concept to exploit the unequal sensitivity of mel-frequency cepstral coefficients (MFCC) components to against the mismatch, such as ambient noise, recording equipment, transmission channels, and inter-speaker variations. We further design a new Kullback-Leibler (KL) distance based weighting algorithm according to the proposed weighting concept to real-world problems in which the label information is often not provided. We examine our algorithm in ASR with mismatch by different speakers and also in ASI with mismatch by channel noises. Experimental results demonstrate the effectiveness and robustness of our proposed method.

[1]  A. Lilienfeld,et al.  What else is new? An historical excursion. , 1977, American journal of epidemiology.

[2]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[3]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[5]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[6]  John B. Moore,et al.  On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure , 1993, IEEE Trans. Signal Process..

[7]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[8]  Steve Young,et al.  The HTK book , 1995 .

[9]  Kuldip K. Paliwal Spectral subband centroids as features for speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[11]  Bert Cranen,et al.  Acoustic Features and Distance Measure to Reduce Vulnerability of ASR Performance Due to the Presence of a Communication Channel and/or Background Noise , 2001 .

[12]  Chen Yang,et al.  Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR , 2005, IEEE Transactions on Audio, Speech, and Language Processing.