Voice-based gender identification via multiresolution frame classification of spectro-temporal maps

This paper presents a novel approach to gender identification based on adaptive multiresolution (MR) classification of spectro-temporal maps. The images of speech signals in this work are mainly provided by auditory inspired spectro-temporal representations: mel-spectrogram, cochleagram and auditory spectrogram. The 2-D representation of a segment of an utterance is used as the input to the system. The system adds MR decomposition in front of a generic classifier consisting of feature extraction and classification in each MR subspace, finally combined into a global decision using a weighting algorithm. It has been shown that the accuracy of the proposed method, by rising up to 99%, significantly outperforms the accuracy of most of other common algorithms which combine pitch and acoustical features for gender identification.

[1]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[2]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[3]  Stephen Lin,et al.  An Adaptive Multiresolution Approach to Fingerprint Recognition , 2007, 2007 IEEE International Conference on Image Processing.

[4]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  Markus Iseli,et al.  The role of voice source measures on automatic gender classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.