Estimation of the Voice Source from Speech Pressure Signals: Evaluation of an Inverse Filtering Technique Using Physical Modelling of Voice Production

Objective: The goal of the study is to use physical modelling of voice production to assess the performance of an inverse filtering method in estimating the glottal flow from acoustic speech pressure signals. Methods: An automatic inverse filtering method is presented, and speech pressure signals are generated using physical modelling of voice production so as to obtain test vowels with a known shape of the glottal excitation waveform. The speech sounds produced consist of 4 different vowels, each with 10 different values of the fundamental frequency. Both the original glottal flows given by physical modelling and their estimates computed by inverse filtering were parametrised with two robust voice source parameters: the normalized amplitude quotient and the difference (in decibels) between the levels of the first and second harmonics. Results:The results show that for both extracted parameters the error introduced by inverse filtering was, in general, small. The effect of the distortion caused by inverse filtering on the parameter values was clearly smaller than the change in the corresponding parameters when the phonation type was altered. The distortion was largest for high-pitched vowels with the lowest value of the first formant. Conclusions: The study shows that the proposed inverse filtering technique combined with the extracted parameters constitutes a voice source analysis tool that is able to measure the voice source dynamics automatically with satisfactory accuracy.

[1]  A Löfqvist,et al.  Laryngeal vibrations: a comparison between high-speed filming and glottographic techniques. , 1983, The Journal of the Acoustical Society of America.

[2]  J. Sundberg,et al.  Spectral correlates of glottal voice source waveform characteristics. , 1989, Journal of speech and hearing research.

[3]  J. Perkell,et al.  Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. , 1988, The Journal of the Acoustical Society of America.

[4]  Spencer L. BeMent,et al.  Automatic glottal inverse filtering , 1984, ICASSP.

[5]  H. Strube,et al.  SIM--simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals. , 2001, The Journal of the Acoustical Society of America.

[6]  D. Veeneman,et al.  Automatic glottal inverse filtering from speech and electroglottographic signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[7]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[8]  M. Huffman Measures of phonation type in Hmong. , 1987, The Journal of the Acoustical Society of America.

[9]  E Sala,et al.  Effects of prolonged oral reading on time-based glottal flow waveform parameters with special reference to gender differences. , 1997, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[10]  A. K. Krishnamurthy Glottal source estimation using a sum of exponentials model , 1989 .

[11]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[12]  R. Miller Nature of the Vocal Cord Wave , 1956 .

[13]  I. Titze,et al.  Voice simulation with a body-cover model of the vocal folds. , 1995, The Journal of the Acoustical Society of America.

[14]  Paavo Alku,et al.  The Effects of Post-Loading Rest on Acoustic Parameters with Special Reference to Gender and Ergonomic Factors , 2001, Folia Phoniatrica et Logopaedica.

[15]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[16]  Ingo R Titze,et al.  Regulating glottal airflow in phonation: application of the maximum power transfer theorem to a low dimensional phonation model. , 2002, The Journal of the Acoustical Society of America.

[17]  Paul H. Milenkovic,et al.  Glottal inverse filtering by joint estimation of an AR system with a linear input model , 1986, IEEE Trans. Acoust. Speech Signal Process..

[18]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[19]  P. Alku,et al.  Physical variations related to stress and emotional state: A preliminary study. , 1996 .

[20]  S. Granqvist,et al.  A method of applying Fourier analysis to high-speed laryngoscopy. , 2001, The Journal of the Acoustical Society of America.

[21]  E Sala,et al.  Loading changes in time-based parameters of glottal flow waveforms in different ergonomic conditions. , 1997, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[22]  I R Titze,et al.  Vocal intensity in speakers and singers. , 1991, The Journal of the Acoustical Society of America.

[23]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[24]  M. Hirano Morphological structure of the vocal cord as a vibrator and its variations. , 1974, Folia phoniatrica.

[25]  H. K. Schutte,et al.  Videokymography: high-speed line scanning of vocal fold vibration. , 1996, Journal of voice : official journal of the Voice Foundation.

[26]  Gunnar Fant,et al.  The voice source in connected speech , 1997, Speech Commun..

[27]  H. Wit,et al.  Glottal volume velocity waveform characteristics in subjects with and without vocal training, related to gender, sound intensity, fundamental frequency, and age. , 1996, The Journal of the Acoustical Society of America.

[28]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[29]  Jonathan Harrington,et al.  The Acoustic Theory of Speech Production , 1999 .

[30]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[31]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[32]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[33]  I. Titze,et al.  Rules for controlling low-dimensional vocal fold models with muscle activation. , 2002, The Journal of the Acoustical Society of America.

[34]  Joseph S. Perkell,et al.  Erratum: ‘‘Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice’’ [J. Acoust. Soc. Am. 84, 511–529 (1988)] , 1989 .

[35]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[36]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.