Large-scale analysis of formant frequency estimation variability in conversational telephone speech

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer [1, 2] in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular channels than those in landline, while F2 in general shows an opposite trend. We also characterized vowel shift trends in northern states in U.S.A. and compared them with the Northern city chain shift (NCCS) [3]. Our analysis is useful in forensic applications where it is important to distinguish between speaker, dialect, and channel characterisitcs.

[1]  F. James Statistical Methods in Experimental Physics , 1973 .

[2]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[3]  Thomas Niesler,et al.  The 1998 HTK system for transcription of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Polina Golland,et al.  Permutation Tests for Classification: Towards Statistical Significance in Image-Based Studies , 2003, IPMI.

[5]  Cynthia G. Clopper,et al.  Acoustic characteristics of the vowel systems of six regional varieties of American English. , 2005, The Journal of the Acoustical Society of America.

[6]  D. Talkin Speech formant trajectory estimation using dynamic programming with modulated transition costs , 1987 .

[7]  William Labov,et al.  The atlas of North American English : phonetics, phonology and sound change : a multimedia reference tool , 2006 .

[8]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[9]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[10]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Hermann J. Knzel,et al.  Beware of the ?telephone effect?: the influence of telephone transmission on the measurement of formant frequencies , 2001 .

[12]  Douglas A. Reynolds,et al.  Dialect recognition using adapted phonetic models , 2008, INTERSPEECH.

[13]  William A. Brenneman Statistics for Research , 2005, Technometrics.

[14]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[15]  Hermann J. Künzel,et al.  Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies , 2001 .

[16]  Paul Foulkes,et al.  The ?Mobile Phone Effect? on Vowel Formants , 2004 .