Speaker Identification for Disguised Voices Based on Modified SVM Classifier

Since voice disguise forms a significant threat in the plethora of illegal applications, it is essential to be able to identify the unknown speaker. This work focuses on scheming a modified Support Vector Machine (SVM) as a classifier to enhance the degraded speaker identification performance for disguised voices under an extreme high-pitched condition in a neutral talking environment. This research utilizes three different speech datasets: Arabic Emirati-accented database, “Speech Under Simulated and Actual Stress” (SUSAS) English database, and “Ryerson Audio-Visual Database of Emotional Speech and Song” (RAVDESS) English database. Our results show that modified SVM reports an average speaker identification performance for disguised voices equal to 93.95%. Our work demonstrates that modified SVM is superior to other classical classifiers such as: K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), Radial Basis Function (RBF), Naïve Bayes (NB), and the conventional SVM.

[1]  Cuiling Zhang,et al.  Voice disguise and automatic speaker recognition. , 2008, Forensic science international.

[2]  Nian Zhang,et al.  An effective LS-SVM-based approach for surface roughness prediction in machined surfaces , 2016, Neurocomputing.

[3]  Ramjee Prasad,et al.  Fusion Multistyle Training for Speaker Identification of Disguised Speech , 2019, Wirel. Pers. Commun..

[4]  Cuiling Zhang Acoustic analysis of disguised voices with raised and lowered pitch , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[5]  Yong Wang,et al.  Identification of Electronic Disguised Voices , 2014, IEEE Transactions on Information Forensics and Security.

[6]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[7]  Farid García,et al.  A comprehensive survey on support vector machine classification: Applications, challenges and trends , 2020, Neurocomputing.

[8]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[9]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[10]  Ismail Shahin,et al.  Emirati-Accented Speaker Identification in Stressful Talking Conditions , 2019, 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA).

[11]  W. Endres,et al.  Voice spectrograms as a function of age, voice disguise, and voice imitation. , 1971, The Journal of the Acoustical Society of America.

[12]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[13]  Ismail Shahin,et al.  Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments , 2013, International Journal of Speech Technology.

[14]  A. Basudhar,et al.  An improved adaptive sampling scheme for the construction of explicit boundaries , 2010 .

[15]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[16]  Gérard Chollet,et al.  Voice Disguise and Automatic Detection: Review and Perspectives , 2005, WNSP.

[17]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Michael R. W. Dawson,et al.  The Multilayer Perceptron , 2008 .

[19]  Zhizheng Wu,et al.  Voice conversion and spoofing attack on speaker verification systems , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[20]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[21]  Ismail Shahin,et al.  Emarati speaker identification , 2014, 2014 12th International Conference on Signal Processing (ICSP).