Multi-frame rate based multiple-model training for robust speaker identification of disguised voice

Speaker identification systems are prone to attack when voice disguise is adopted by the user. To address this issue, our paper studies the effect of using different frame rates on the accuracy of the speaker identification system for disguised voice. In addition, a multi-frame rate based multiple-model training method is proposed. The experimental results show the superior performance of the proposed method compared to the commonly used single frame rate method for three types of disguised voice, namely, synchronous, fast and repetitive synchronous imitation taken from the CHAINS corpus.

[1]  Zheng-Hua Tan,et al.  Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[2]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[3]  Paul Dalsgaard,et al.  Robust speech recognition based on noise and SNR classification - a multiple-model framework , 2005, INTERSPEECH.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Juraj Simko,et al.  The CHAINS corpus: CHAracterizing INdividual Speakers , 2006 .

[6]  Larson,et al.  Experiences of large scale implementation of speech analysing tools in learning Swedish as second language , 2000 .

[7]  Fred Cummins,et al.  Speech style and speaker recognition: a case study , 2009, INTERSPEECH.

[8]  S. R. Mahadeva Prasanna,et al.  Multiple frame size and rate analysis for speaker recognition under limited data condition , 2009 .

[9]  Mireia Farrús,et al.  Robustness of prosodic features to voice imitation , 2008, INTERSPEECH.

[10]  Cuiling Zhang,et al.  Voice disguise and automatic speaker recognition. , 2008, Forensic science international.

[11]  Elizabeth Shriberg,et al.  A Study of Intentional Voice Modifications for Evading Automatic Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[12]  Leonardo Zao,et al.  Colored Noise Based Multicondition Training Technique for Robust Speaker Identification , 2011, IEEE Signal Processing Letters.

[13]  Fred Cummins,et al.  Practice and performance in speech produced synchronously , 2003, J. Phonetics.