An estimate of physical scale from speech

We present an algorithm, based on the EM algorithm, which simultaneously estimates both physical scale and vowel identities from a segment of speech. The validity of the algorithm depends on the scale hypothesis that the variation of the formant frequencies for a given vowel is mainly influenced by the physical size of the speaker. This is both a new application and a new confirmation of a hypothesis that is often accepted without proof.

[1]  D G Childers,et al.  Gender recognition from speech. Part II: Fine analysis. , 1991, The Journal of the Acoustical Society of America.

[2]  David C. Smith,et al.  Biometric speaker classification , 2000, SPIE Optics + Photonics.

[3]  N J Lass,et al.  Correlational study of speakers' heights, weights, body surface areas, and speaking fundamental frequencies. , 1978, The Journal of the Acoustical Society of America.

[4]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[5]  Anthony Bladon,et al.  Acoustic phonetics, auditory phonetics, speaker sex and speech recognition: a thread , 1986 .

[6]  H J Künzel,et al.  How Well Does Average Fundamental Frequency Correlate with Speaker Height and Weight? , 1989, Phonetica.

[7]  Leon Cohen,et al.  Frequency-warping in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Leon Cohen,et al.  Improved scale-cepstral analysis in speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[11]  J. Bachorowski,et al.  Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. , 1999, The Journal of the Acoustical Society of America.

[12]  N J Lass,et al.  An investigation of speaker height and weight identification. , 1976, The Journal of the Acoustical Society of America.

[13]  W. Fitch Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. , 1997, The Journal of the Acoustical Society of America.

[14]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[15]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.