Multiple Frame Rates for Feature Extraction and Reliable Frame Selection at the Decision for Speaker Identification Under Voice Disguise

Determining the person who spoke a given speech utterance from a group of people is referred to as Speaker Identification. It is used in crime scenes, surveillance and consumer electronic products like smart TV. But it faces poor performance due to a mismatch between the train and the test speech data, that arises because of the adoption of voice disguise. Therefore, this paper studies the effect of three different types of voice disguises, namely, Fast (nonimitative), Synchronous (Imitative) and Repetitive Synchronous Imitation along with the normal speaking from the CHAINS corpus on the speaker identification performance. Finally, a system combining different frame rates for feature extraction and reliable frame selection at the decision level has been proposed. The evaluated system showed an overall better performance than the baseline systems.

[1]  H. Masthoff A report on a voice disguise experiment , 2013 .

[2]  Computer Recognition of Speakers Who Disguise Their Voice , 2000 .

[3]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[4]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[6]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[8]  Adrian Leemann,et al.  Speaker-invariant suprasegmental temporal features in normal and disguised speech , 2015, Speech Commun..

[9]  Juraj Simko,et al.  The CHAINS corpus: CHAracterizing INdividual Speakers , 2006 .

[10]  W. Endres,et al.  Voice spectrograms as a function of age, voice disguise, and voice imitation. , 1971, The Journal of the Acoustical Society of America.

[11]  Kenneth L. Moll,et al.  Effects of selected vocal disguises upon spectrographic speaker identification , 1976 .

[12]  Fred Cummins,et al.  Speech style and speaker recognition: a case study , 2009, INTERSPEECH.

[13]  Hynek Hermansky,et al.  Robust Feature Extraction Using Modulation Filtering of Autoregressive Models , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Sridha Sridharan,et al.  Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[16]  Ramjee Prasad,et al.  Multi-frame rate based multiple-model training for robust speaker identification of disguised voice , 2013, 2013 16th International Symposium on Wireless Personal Multimedia Communications (WPMC).

[17]  Ramjee Prasad,et al.  Multistyle Training and Fusion for Speaker Identification of Disguised Voice , 2013, ICC 2013.

[18]  Jie Li,et al.  The Optimized Dictionary based Robust Speaker Recognition , 2017, J. Signal Process. Syst..

[19]  Yonghong Yan,et al.  Robust speaker recognition using library of cross-domain variation compensation transforms , 2016 .

[20]  I. Shahin Speaker identification investigation and analysis in Two distinct emotional talking environments , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[21]  Cuiling Zhang,et al.  Voice disguise and automatic speaker recognition. , 2008, Forensic science international.

[22]  Dat Tran,et al.  Testing Voice Mimicry with the YOHO Speaker Verification Corpus , 2005, KES.

[23]  Leonardo Zao,et al.  Colored Noise Based Multicondition Training Technique for Robust Speaker Identification , 2011, IEEE Signal Processing Letters.