Improvement on automatic speaker gender identification using classifier fusion

In this paper, a two layer classifier fusion technique is proposed for automatic gender identification (AGI). The first layer is an acoustic classification layer for mapping MFCC acoustic feature space to score space. In this layer, a divisive clustering is proposed for dividing the speakers from each gender to some classes of speakers having similar vocal articulatory. The second layer is a back-end classifier that receives the vectors of fused likelihood scores from the first layer. GMM, SVM and MLP classifiers were evaluated in the middle and back-end layers. 96.53% gender classification accuracy was obtained on OGI multilingual corpus which is much better than the performance obtained by traditional AGI methods.

[1]  Mohammad Mehdi Homayounpour,et al.  Feature selection and dimension reduction for automatic gender identification , 2009, 2009 14th International CSI Computer Conference.

[2]  Sang-Ick Kang,et al.  A Support Vector Machine-Based Gender Identification Using Speech Signal , 2008, IEICE Trans. Commun..

[3]  Elmar Nöth,et al.  Age and gender recognition for telephone applications based on GMM supervectors and support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[5]  W. Abdulla,et al.  Improving speech recognition performance through gender separation , 1988 .

[6]  Fang Chen,et al.  Improvements on hierarchical language identification based on automatic language clustering , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  J. Nouza,et al.  Speech, Speaker and Speaker\'s Gender Identification in Automatically Processed Broadcast Stream , 2006 .

[9]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.