I-vector based speaker gender recognition

Automatic gender recognition has been becoming very important in potential applications. Many state-of-the-art gender recognition approaches based on a variety of biometrics, such as face, body shape, voice, are proposed recently. Among them, relying on voice is suboptimal due to significant variations in pitch, emotion, and noise in real-world speech. Inspired from the speaker recognition approaches relying on i-vector presentation in NIST SRE, it's believed that i-vector contains information about gender as a part of speaker's characters, and works for speaker recognition as well as for gender recognition in complex environments. So, we apply the total variability space analysis to gender classification and propose i-vector based discrimination for speaker gender recognition. The results of experiments on TIMIT corpus and NUST603_2014 database show that the proposed i-vector based speaker gender recognition improves the performance up to 99.9%, and surpasses the pitch method and UBM-SVM baseline subsystems in term of accuracy comparatively.

[1]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[2]  Bok-Min Goi,et al.  Vision-based Human Gender Recognition: A Survey , 2012, ArXiv.

[3]  José Miguel Buenaposada,et al.  Robust gender recognition by exploiting facial attributes dependencies , 2014, Pattern Recognit. Lett..

[4]  Wu Zhaohui,et al.  Combining MFCC and Pitch to Enhance the Performance of the Gender Recognition , 2006, 2006 8th international Conference on Signal Processing.

[5]  Sridha Sridharan,et al.  I-vector based speaker recognition using advanced channel compensation techniques , 2014, Comput. Speech Lang..

[6]  James R. Glass,et al.  Exploiting Intra-Conversation Variability for Speaker Diarization , 2011, INTERSPEECH.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Michael D. Hunter,et al.  Discrimination of voice gender in the human auditory cortex , 2015, NeuroImage.

[9]  Longbiao Wang,et al.  PLDA in the i-supervector space for text-independent speaker verification , 2014, EURASIP J. Audio Speech Music. Process..

[10]  Joaquin Gonzalez-Rodriguez,et al.  Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014) , 2014 .

[11]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  V. N. Sorokin,et al.  Gender recognition from vocal source , 2008 .

[13]  Wei-Yun Yau,et al.  Text independent speaker gender recognition using lip movement , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[14]  Nello Cristianini,et al.  Learning to classify gender from four million images , 2015, Pattern Recognit. Lett..