Age and gender classification using modulation cepstrum

This paper proposes using modulation cepstrum coefficients instead of cepstral coefficients for extracting metadata information such as age and gender. These coefficients are extracted by applying discrete cosine transform to a time-sequence of cepstral coefficients. Lower order coefficients of this transformation represent smooth cepstral trajectories over time. Results presented in this paper show that cepstral trajectories corresponding to lower (3-14 Hz) modulation frequencies provide best discrimination. The proposed system achieves 50.2% overall accuracy for this 7-class task while accuracy of human labelers on a subset of evaluation material used in this work is 54.7%.

[1]  Ben P. Milner,et al.  Inclusion of temporal information into features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Rainer Wasinger,et al.  Adapting multimodal dialog for the elderly , 2002 .

[3]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[4]  Joaquim A. Jorge,et al.  Adaptive tools for the elderly: new devices to cope with age-induced cognitive disabilities , 2001, WUAUC'01.

[5]  Christian A. Müller,et al.  Exploiting speech for recognizing elderly users to respond to their special needs , 2003, INTERSPEECH.

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  Saeed Vaseghi,et al.  An analysis of cepstral-time matrices for noise and channel robust speech recognition , 1995, EUROSPEECH.

[8]  Hervé Bourlard,et al.  Mel-cepstrum modulation spectrum (MCMS) features for robust ASR , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[9]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  Keikichi Hirose,et al.  Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Susanne Schötz Automatic prediction of speaker age using CART , 2005 .