Automatic speaker age and gender recognition in the car for tailoring dialog and mobile services

Car manufacturers are faced with a new challenge. While a new generation of “digital natives” becomes a new customer group, the problem of aging society is still increasing. This emphasizes the need of providing flexible in-car dialog that take into account the specific needs and preferences of the respective user (group). Along the lines of this year’s Interspeech motto “Spoken Language Processing for All”, we address the question how we find out which group the current user belongs to. We present a GMM/SVM-supervector system (Gaussian Mixture Model combined with Support Vector Machine) for speaker age and gender recognition, a technique that is adopted from state-of-the-art speaker recognition research. We furthermore describe an experimental study with the aim to evaluate the performance of the system as well as to explore the selection of parameters.

[1]  James R. Glass,et al.  City browser: developing a conversational automotive HMI , 2009, CHI Extended Abstracts.

[2]  Elmar Nöth,et al.  Age and gender recognition for telephone applications based on GMM supervectors and support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Keikichi Hirose,et al.  Automatic estimation of perceptual age using speaker modeling techniques , 2003, INTERSPEECH.

[4]  J. Wood,et al.  Aging, driving and vision , 2002, Clinical & experimental optometry.

[5]  Susanne Schötz,et al.  Perception, Analysis and Synthesis of Speaker Age , 2006 .

[6]  Marjan P. Hagenzieker,et al.  Effects of In-Car Support on Mental Workload and Driving Performance of Older Drivers , 2009, Hum. Factors.

[7]  M. Lévesque Perception , 1986, The Yale Journal of Biology and Medicine.

[8]  P. Ptacek,et al.  Age recognition from voice. , 1966, Journal of speech and hearing research.

[9]  N. Takahashi Aging , 1992, Cell.

[10]  Susanne Schötz,et al.  Acoustic Analysis of Adult Speaker Age , 2007, Speaker Classification.

[11]  Andrea Paoloni,et al.  Subjective age estimation of telephonic voices , 2000, Speech Commun..