Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech

Prevalent speaker recognition methods use only spectralenvelope based features such as MFCC, ignoring the rich speaker identity information contained in the temporalspectral dynamics of the entire speech signal. We propose a new feature called compressed spectral dynamics or CSD for speaker recognition based on a compressed time-frequency representations of spoken passwords which effectively captures the speaker identity. The fixed-dimension nature of the CSD allows classification to remain simple while keeping the discriminatory power of the 2D intermediate time-frequency representations. The proposed MSRI-CSD text-dependent speaker recognition method uses a simple nearest neighbor classifier and delivers performance competitive to conventional MFCC+DTW based speaker recognition methods at significantly lower complexity.

[1]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2]  Amitava Das,et al.  Usefulness of text-conditioning and a new database for text-dependent speaker recognition research , 2008, INTERSPEECH.

[3]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[5]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[6]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[7]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[11]  Prasanta Ghosh,et al.  AUDIO-VISUAL BIOMETRIC RECOGNITION BY VECTOR QUANTIZATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[12]  Nengheng Zheng,et al.  Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.