Audio Features Selection for Automatic Height Estimation from Speech

Aiming at the automatic estimation of the height of a person from speech, we investigate the applicability of various subsets of speech features, which were formed on the basis of ranking the relevance and the individual quality of numerous audio features Specifically, based on the relevance ranking of the large set of openSMILE audio descriptors, we performed selection of subsets with different sizes and evaluated them on the height estimation task In brief, during the speech parameterization process, every input utterance is converted to a single feature vector, which consists of 6552 parameters Next, a subset of this feature vector is fed to a support vector machine (SVM)-based regression model, which aims at the straight estimation of the height of an unknown speaker The experimental evaluation performed on the TIMIT database demonstrated that: (i) the feature vector composed of the top-50 ranked parameters provides a good trade-off between computational demands and accuracy, and that (ii) the best accuracy, in terms of mean absolute error and root mean square error, is observed for the top-200 subset.

[1]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[4]  R. J. Donato Model experiments on surface waves , 1978 .

[5]  N J Lass,et al.  Correlational study of speakers' heights, weights, body surface areas, and speaking fundamental frequencies. , 1978, The Journal of the Acoustical Society of America.

[6]  Sorin Dusan Estimation of speaker's height and vocal tract length from speech signal , 2005, INTERSPEECH.

[7]  D. Rendall,et al.  Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry. , 2005, The Journal of the Acoustical Society of America.

[8]  H J Künzel,et al.  How Well Does Average Fundamental Frequency Correlate with Speaker Height and Weight? , 1989, Phonetica.

[9]  M. van Oostendorp,et al.  Schwa in phonological theory , 1998 .

[10]  M. R. Manzini Syntactic approaches to cliticization , 1998 .

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[13]  W. V. van Dommelen,et al.  Acoustic Parameters in Speaker Height and Weight Identification: Sex-Specific Behaviour , 1995, Language and speech.

[14]  John H. L. Hansen,et al.  Voice analysis in adverse conditions: the Centennial Olympic Park Bombing 911 call , 1997, Proceedings of 40th Midwest Symposium on Circuits and Systems. Dedicated to the Memory of Professor Mac Van Valkenburg.

[15]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[16]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[17]  S. Collins,et al.  Men's voices and women's choices , 2000, Animal Behaviour.

[18]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[19]  Ian Witten,et al.  Data Mining , 2000 .

[20]  Julio Gonzalez,et al.  Estimation of Speakers' Weight and Height from Speech: A Re-Analysis of Data from Multiple Studies by Lass and Colleagues , 2003, Perceptual and motor skills.

[21]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.