Speaker age and gender classification using GMM supervector and NAP channel compensation method

One of the most important factors affecting the performance of speech-based recognition systems is the differences between training and test conditions. The Nuisance attribute projection (NAP) is an effective method for eliminating these differences, called channel effects. In this study, the effects of the NAP approach in determining age and gender groups are investigated. Mel-frequency cepstral coefficients and delta coefficients are used as a feature and Gaussian mixture models (GMM) adapted from the universal background model by maximum-a-posteriori method are used for the modeling of age and gender classes. After the GMMs corresponding to each speech are converted into mean supervectors, they are applied to a Support Vector Machine (SVM), and speeches are classified according to the age and gender group of the speakers. While linear GMM kernel based on Kullback–Leibler divergence is used instead of standard SVM kernels, the NAP channel subspace size is changed between 20 and 200 and the number of GMM components is changed between 32 and 512 to determine the optimum values for these parameters. In the tests on the aGender database, the optimum number of components is determined as 128, and the optimum NAP channel subspace size is determined as 45. The age and gender classification accuracy of the system, which is developed using these optimum parameters, is increased from 60.52 to 62.03% with the use of NAP. In addition, age classification accuracy is increased from 60.23 to 61.82% and gender classification accuracy is increased from 91.71 to 92.30%.

[1]  Levent M. Arslan,et al.  An Investigation of Multi-Language Age Classification from Voice , 2019, BIOSIGNALS.

[2]  Buket D. Barkana,et al.  DNN-based Models for Speaker Age and Gender Classification , 2017, BIOSIGNALS.

[3]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[4]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[5]  Sreedhar Bhukya,et al.  Effect of Gender on Improving Speech Recognition System , 2018 .

[6]  Driss Matrouf,et al.  State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Saeid Safavi,et al.  Automatic speaker, age-group and gender identification from children's speech , 2018, Comput. Speech Lang..

[8]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[9]  Cigdem Bakir,et al.  Automatic Speaker Gender Identification for the German Language , 2016 .

[10]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[11]  Shijun Lu,et al.  Acoustic Feature Comparison of MFCC and CZT-Based Cepstrum for Speech Recognition , 2009, 2009 Fifth International Conference on Natural Computation.

[12]  Hugo Van hamme,et al.  Speaker age estimation using i-vectors , 2014, Eng. Appl. Artif. Intell..

[13]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[14]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Yaniv Zigel,et al.  Age recognition based on speech signals using weights supervector , 2010, INTERSPEECH.

[16]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Lukás Burget,et al.  Brno university of technology system for interspeech 2010 paralinguistic challenge , 2010, INTERSPEECH.

[18]  Adel Akbarimajd,et al.  Two stage forecast engine with feature selection technique and improved meta-heuristic algorithm for electricity load forecasting , 2018, Energy.

[19]  Loganathan Agilandeeswari,et al.  An intelligent lung cancer diagnosis system using cuckoo search optimization and support vector machine classifier , 2017, Journal of Ambient Intelligence and Humanized Computing.

[20]  J. S. Mason,et al.  Velocity and acceleration features in speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Maxim Markitantov,et al.  Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks , 2019, SPECOM.

[22]  Sadaoki Furui,et al.  Comparison of speaker recognition methods using statistical features and dynamic features , 1981 .

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  Wei Gao,et al.  Different states of multi-block based forecast engine for price and load prediction , 2019, International Journal of Electrical Power & Energy Systems.