Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications

Speech signals carry valuable information about the speaker including age, gender, and emotional state. Gender information can act as a vital preprocessing ingredient for enhancing speech analysis applications like adaptive human-machine interfaces, multi-modal security applications, and sophisticated intent and context analysis based forensic systems. In uncontrolled environments like telephone speech applications, the gender recognition system should be adaptive, accurate, and robust to noisy environments. This paper presents a reasoning method governed by Dempster-Shafer theory of evidence for automatic gender recognition from telephone speech. The proposed method uses mel-frequency cepstral coefficients with a support vector machine to generate the initial prediction results for individual speech segments. The reasoning scheme collects and validates results from support vector machine and treats convincing predictions as valid evidence. It is argued that the consideration of valid evidence in the reasoning process improves recognition performance by avoiding unconvincing classification results. Experiments conducted on large speech datasets reveal the superiority of the proposed gender recognition scheme for speech analysis applications.

[1]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[2]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Haizhou Li,et al.  Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  Jamil Ahmad,et al.  A Fusion of Labeled-Grid Shape Descriptors with Weighted Ranking Algorithm for Shapes Recognition , 2014, ArXiv.

[7]  Elmar Nöth,et al.  Age and gender recognition for telephone applications based on GMM supervectors and support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[9]  Sung Wook Baik,et al.  Gender Identification using MFCC for Telephone Applications - A Comparative Study , 2016, ArXiv.

[10]  Douglas D. O'Shaughnessy,et al.  Multitaper MFCC and PLP features for speaker verification using i-vectors , 2013, Speech Commun..

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[13]  Elmar Nöth,et al.  Age and gender recognition based on multiple systems - early vs. late fusion , 2010, INTERSPEECH.

[14]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[15]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..