Speaker Age Classification and Regression Using i-Vectors

In this paper, we examine the use of i-vectors both for age regression as well as for age classification. Although i-vectors have been previously used for age regression task, we extend this approach by applying fusion of i-vectors and acoustic features regression to estimate the speaker age. By our fusion we obtain a relative improvement of 12.6% comparing to solely ivector system. We also use i-vectors for age classification, which to our knowledge is the first attempt to do so. Our best results reach unweighted accuracy 62.9%, which is a relative improvement of 16.7% comparing to the best results obtained in age classification task at Age Sub-Challenge at Interspeech 2010.

[1]  Hugo Van hamme,et al.  Speaker age estimation using i-vectors , 2014, Eng. Appl. Artif. Intell..

[2]  Mireille Avigal,et al.  Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[4]  Tomi Kinnunen,et al.  Exploring ANN back-ends for i-vector based speaker age estimation , 2015, INTERSPEECH.

[5]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[6]  Lukás Burget,et al.  Brno university of technology system for interspeech 2010 paralinguistic challenge , 2010, INTERSPEECH.

[7]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[8]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[9]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[10]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[11]  Florian Metze,et al.  Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[13]  Mariusz Ziólko,et al.  System supporting speaker identification in emergency call center , 2015, INTERSPEECH.

[14]  Susanne Schötz,et al.  Perception, Analysis and Synthesis of Speaker Age , 2006 .

[15]  Felix Burkhardt,et al.  A Database of Age and Gender Annotated Telephone Speech , 2010, LREC.

[16]  Tomi Kinnunen,et al.  i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Jakub Galka,et al.  Caller identification by voice , 2014, XXII Annual Pacific Voice Conference (PVC).

[18]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Trung Le,et al.  Fuzzy support vector machines for age and gender classification , 2010, INTERSPEECH.