The voice source in speech production: data, analysis and models

Analysis of the voice source with respect to voice quality is essential to the understanding of the human speech production system, which can lead to better speech modeling for improving a vast range of applications. However, due to the position of the vocal folds, analyzing the source is often hampered by the lack of direct observations with which to calibrate algorithms. In this dissertation, two approaches to voice source and voice quality analysis were pursued. In the first approach, the source waveform was extracted by analyzing the glottal area waveforms from high-speed imaging of the vocal folds. These direct observations led to the development of a new source model, which is more accurate compared to existing models. A codebook search technique was then proposed to estimate the source signal from the acoustic data. Results were promising for a number of model parameters such as the open quotient and speed of opening. However, error analysis showed that the algorithm required reasonable formant-frequency constraints which may be difficult to obtain automatically in some cases. In the second approach, voice source related measures were used in three voice quality applications: voice source analysis, automatic gender classification and prosody analysis. In voice source analysis, acoustic measures were examined in the context of the voice source model parameters obtained from model-fitting the glottal arca waveforms. Results showed that correlations could be made between model parameters and the related acoustic measures, such as the asymmetry coefficient and harmonic-to-noise ratio measures. It was also shown that the model parameters and related acoustic measures were affected by the type of voice quality (pressed, normal and breathy). In gender classification, voice source related measures were found to be more helpful in younger (10-14 year old) speakers, where traditional pitch and formant frequency features were less useful. Analysis of prosody showed that, amongst other things, features correlated to pitch accents were not necessarily centered at the target syllable, and depended on the position of other prosodic events.

[1]  S. Bennett Vowel formant frequency characteristics of preadolescent males and females. , 1980, The Journal of the Acoustical Society of America.

[2]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[3]  Elliot Moore,et al.  Algorithm for automatic glottal waveform estimation without the reliance on precise glottal closure information , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Francis Nolan,et al.  Pitch accent realization in four varieties of British English , 2000, J. Phonetics.

[5]  Barabara Blankenship,et al.  The time course of breathiness and laryngealization in vowels , 1997 .

[6]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[7]  Jody Kreiman,et al.  Measures of the glottal source spectrum. , 2007, Journal of speech, language, and hearing research : JSLHR.

[8]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[9]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[10]  Yi Xu,et al.  Maximum speed of pitch change and how it may relate to speech. , 2002, The Journal of the Acoustical Society of America.

[11]  Mark F Medress,et al.  Acoustic Correlates of Word Stress , 1972 .

[12]  Janet Slifka,et al.  Some physiological correlates to regular and irregular phonation at the end of an utterance. , 2006, Journal of voice : official journal of the Voice Foundation.

[13]  Stefanie Shattuck-Hufnagel,et al.  A prosody tutorial for investigators of auditory sentence processing , 1996, Journal of psycholinguistic research.

[14]  C. Fougeron,et al.  Rate effects on French intonation: prosodic organization and phonetic realization , 1998 .

[15]  D. Robert Ladd,et al.  Phonetic effects of focus and "tonal crowding" in intonation: Evidence from Greek polar questions , 2006, Speech Commun..

[16]  Abeer Alwan,et al.  Voice source correlates of prosodic features in american English: a pilot study , 2006, INTERSPEECH.

[17]  Janet Pierrehumbert,et al.  Gesture, Segment, Prosody: Lenition of |h| and glottal stop , 1992 .

[18]  Sungbok Lee,et al.  Creation of two children's speech databases , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Amalia Arvaniti,et al.  Dialectal variation in the rising accents of American English , 2007 .

[20]  J W Hawks,et al.  A formant bandwidth estimation procedure for vowel synthesis [43.72.Ja]. , 1995, The Journal of the Acoustical Society of America.

[21]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[22]  J. Perkell,et al.  Comparisons among aerodynamic, electroglottographic, and acoustic spectral measures of female voice. , 1995, Journal of speech and hearing research.

[23]  Mark A. Clements,et al.  Glottal Models for Digital Speech Processing: A Historical Survey and New Results , 1995 .

[24]  M. S. Howe,et al.  Sound generated by aerodynamic sources near a deformable body, with application to voiced speech , 2007, Journal of Fluid Mechanics.

[25]  Christophe d'Alessandro,et al.  Spectral correlates of voice open quotient and glottal flow asymmetry : theory, limits and experimental data , 2001, INTERSPEECH.

[26]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[27]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[28]  Christina M. Esposito The effects of linguistic experience on the perception of phonation , 2010, J. Phonetics.

[29]  B. Rosner,et al.  Loudness predicts prominence: fundamental frequency lends little. , 2005, The Journal of the Acoustical Society of America.

[30]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[31]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[32]  Abeer Alwan,et al.  Age, sex, and vowel dependencies of acoustic measures related to the voice source. , 2007, The Journal of the Acoustical Society of America.

[33]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[34]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[35]  Los Angeles Dependencies of Voice Source Measures on Age, Sex, Vowel Context, and Prosodic Features , 2007 .

[36]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[37]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[38]  M. Södersten,et al.  Glottal closure and perceived breathiness during phonation in normally speaking subjects. , 1990, Journal of speech and hearing research.

[39]  Cecilia Odé,et al.  Neutralization or truncation? The perception of two Russian pitch accents on utterance-final syllables , 2005, Speech Commun..

[40]  Julia Hirschberg,et al.  The intonational Structuring of Discourse , 1986, ACL.

[41]  H. Strube,et al.  SIM--simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals. , 2001, The Journal of the Acoustical Society of America.

[42]  Antonio Bonafonte,et al.  Towards robust glottal source modeling , 2009, INTERSPEECH.

[43]  Johan Sundberg,et al.  Maximum speed of pitch changes in singers and untrained subjects , 1979 .

[44]  Arantza del Pozo,et al.  The linear transformation of LF glottal waveforms for voice conversion , 2008, INTERSPEECH.

[45]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[46]  John J. Ohala,et al.  Speed of Pitch Change , 1973 .

[47]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[48]  Roy D. Patterson,et al.  An instantaneous-frequency-based pitch extraction method for high-quality speech transformation: revised TEMPO in the STRAIGHT-suite , 1998, ICSLP.

[49]  Helen M. Hanson,et al.  Glottal characteristics of male speakers: acoustic correlates and comparison with female data. , 1996 .

[50]  Yi Xu,et al.  Contextual tonal variation in Mandarin Chinese , 1993 .

[51]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[52]  Kim E. A. Silverman,et al.  The timing of prenuclear high accents in English , 1987 .

[53]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[54]  D. Klatt,et al.  Structure of a phonological rule component for a synthesis-by-rule program , 1976 .

[55]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[56]  John Makhoul,et al.  Spectral linear prediction: Properties and applications , 1975 .

[57]  J.O. Smith,et al.  Joint estimation of glottal source and vocal tract for vocal synthesis using Kalman smoothing and EM algorithm , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[58]  Matthias Jilka,et al.  The influence of vowel quality features on peak alignment , 2007, INTERSPEECH.

[59]  M. Huffman Measures of phonation type in Hmong. , 1987, The Journal of the Acoustical Society of America.

[60]  Christina M. Esposito,et al.  An acoustic and electroglottographic study of White Hmong phonation. , 2009 .

[61]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[62]  Vincent J. van Heuven,et al.  Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[63]  Per Hedelin A glottal LPC-vocoder , 1984, ICASSP.

[64]  Gunnar Fant,et al.  The voice source in connected speech , 1997, Speech Commun..

[65]  Abeer Alwan,et al.  An improved correction formula for the estimation of harmonic magnitudes and its application to open quotient estimation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66]  Allard Jongman,et al.  Acoustic correlates of breathy and clear vowels: the case of Khmer , 2003, J. Phonetics.

[67]  Martine Grice,et al.  Sources of variation in tonal alignment: Evidence from acoustic and kinematic data , 2009, J. Phonetics.

[68]  P A Busby,et al.  Formant frequency values of vowels produced by preadolescent boys and girls. , 1995, The Journal of the Acoustical Society of America.

[69]  Jeung-Yoon Choi,et al.  Finding intonational boundaries using acoustic cues related to the voice source. , 2005, The Journal of the Acoustical Society of America.

[70]  Sameer ud Dowla Khan An acoustic and electroglottographic study of breathy phonation in Gujarati. , 2009 .

[71]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  B. Blagnys,et al.  To "EE" or not to "EE". , 2007, The Journal of otolaryngology.