An analysis of the influence of acoustical adverse conditions on speaker gender identification

Speaker gender as a biometric feature plays an important role in numerous voice-based services. In this work we perform an accuracy analysis of a gender recognition system in different acoustical environments (indoor and outdoor auditory scenes). At the evaluation stage, each sentence has been mixed with several types of background noise using various signal-to-noise ratio levels. Then a voiced parts of speech have been extracted and parametrized using features based on filter banks and vocal-tract properties. The obtained feature trajectories have been non-linearly smoothed in order to minimize the influence of adverse conditions on the spoken sentences. The observed accuracy is acceptable for voice-based tasks where the gender information can improve their performance.

[1]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[2]  Mathew Magimai.-Doss,et al.  Analysis of F0 and Cepstral Features for Robust Automatic Gender Recognition , 2009 .

[3]  T. Maka Features of average spectral envelope for audio regions determination , 2012, 2012 International Conference on Signals and Electronic Systems (ICSES).

[4]  Naohisa Komatsu,et al.  Speaker gender recognition using score level fusion by AdaBoost , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[5]  Ronald W. Schafer,et al.  Theory and Applications of Digital Speech Processing , 2010 .

[6]  Michael J. Carey,et al.  Language independent gender identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Antonio M. Peinado Speech Recognition Over Digital Channels: Robustness and Standards , 2006 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[10]  Tomasz Maka Environmental Background Sounds Classification Based on Properties of Feature Contours , 2013, IEA/AIE.

[11]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Constantine Kotropoulos,et al.  Gender classification in two Emotional Speech databases , 2008, 2008 19th International Conference on Pattern Recognition.

[13]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.