A new approach with score-level fusion for the classification of a speaker age and gender

A system for classifying speakers according to their age and genders by combining three subsystems at score-level is proposed.The results of the approaches of GMM, SVM, and GMM-SV based SVM are transformed into probabilistic scores and combined.A performance evaluation of each subsystem is studied.The aGender database comprising 47-hour phone conversations with 954 speakers is used.Experimental results proved good in general. In this study a new approach for classifying speakers according to their age and genders is proposed. This approach is composed of score-level fusion of seven sub-systems. In this fused system, which provides improved performance in three classification categories (age, gender and age & gender), spectral and prosodic features extracted from short-duration phone conversations are used with Gaussian Mixture Model (GMM), Support Vector Machine (SVM) and GMM supervector-based SVM classifiers. Also, by examining individual and various combinations of each system, the effect of feature types and classification methods on performance is investigated. With the proposed system, classification success rates are obtained 90.4%, 54.1%, and 53.5% in gender, age and age & gender categories respectively. Display Omitted

[1]  Bayya Yegnanarayana,et al.  Extraction and representation of prosodic features for language and speaker recognition , 2008, Speech Commun..

[2]  Lukás Burget,et al.  Brno university of technology system for interspeech 2010 paralinguistic challenge , 2010, INTERSPEECH.

[3]  Mark J. T. Smith,et al.  PCA/LDA approach for text-independent speaker recognition , 2012, Defense + Commercial Sensing.

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Cheung-Chi Leung,et al.  Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[7]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[8]  Antonio Nucci,et al.  Pitch-based gender identification with two-stage classification , 2012, Secur. Commun. Networks.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11]  Rok Gajsek,et al.  Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation , 2010, INTERSPEECH.

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[14]  Isabel Trancoso,et al.  Age and gender classification using fusion of acoustic and prosodic features , 2010, INTERSPEECH.

[15]  Florian Metze,et al.  Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[17]  H. Bourouba,et al.  A speech signal based gender identification system using four classifiers , 2012, 2012 International Conference on Multimedia Computing and Systems.

[18]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[19]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[20]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[21]  Oscal T.-C. Chen,et al.  Improved gender/age recognition system using arousal-selection and feature-selection schemes , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[22]  Trung Le,et al.  Fuzzy support vector machines for age and gender classification , 2010, INTERSPEECH.

[23]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[24]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[25]  Latika Singh,et al.  Developmental pattern analysis and age prediction by extracting speech features and applying various classification techniques , 2015, International Conference on Computing, Communication & Automation.

[26]  Vasif V. Nabiyev,et al.  Age and gender recognition of a speaker from short-duration phone conversations , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).