Analysis of glottal inverse filtering in the presence of source-filter interaction

The validity of glottal inverse filtering (GIF) to obtain a glottal flow waveform from radiated pressure signal in the presence and absence of source-filter interaction was studied systematically. A driven vocal fold surface model of vocal fold vibration was used to generate source signals. A one-dimensional wave reflection algorithm was used to solve for acoustic pressures in the vocal tract. Several test signals were generated with and without source-filter interaction at various fundamental frequencies and vowels. Linear Predictive Coding (LPC), Quasi Closed Phase (QCP), and Quadratic Programming (QPR) based algorithms, along with supraglottal impulse response, were used to inverse filter the radiated pressure signals to obtain the glottal flow pulses. The accuracy of each algorithm was tested for its recovery of maximum flow declination rate (MFDR), peak glottal flow, open phase ripple factor, closed phase ripple factor, and mean squared error. The algorithms were also tested for their absolute relative errors of the Normalized Amplitude Quotient, the Quasi-Open Quotient, and the Harmonic Richness Factor. The results indicated that the mean squared error decreased with increase in source-filter interaction level suggesting that the inverse filtering algorithms perform better in the presence of source-filter interaction. All glottal inverse filtering algorithms predicted the open phase ripple factor better than the closed phase ripple factor of a glottal flow waveform, irrespective of the source-filter interaction level. Major prediction errors occurred in the estimation of the closed phase ripple factor, MFDR, peak glottal flow, normalized amplitude quotient, and Quasi-Open Quotient. Feedback-related nonlinearity (source-filter interaction) affected the recovered signal primarily when f o was well below the first formant frequency of a vowel. The prediction error increased when f o was close to the first formant frequency due to the difficulty of estimating the precise value of resonance frequencies, which was exacerbated by nonlinear kinetic losses in the vocal tract.

[1]  Subhasmita Sahoo,et al.  A Novel Method of Glottal Inverse Filtering , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  I. Titze Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.

[3]  Thomas F. Quatieri,et al.  Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Abeer Alwan,et al.  Glottal source processing: From analysis to applications , 2014, Comput. Speech Lang..

[5]  Qiang Fu,et al.  Robust Glottal Source Estimation Based on Joint Source-Filter Model Optimization , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[7]  I. Titze The myoelastic aerodynamic theory of phonation , 2006 .

[8]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[9]  Benchmarks for time-domain simulation of sound propagation in soft-walled airways: steady configurations. , 2014, The Journal of the Acoustical Society of America.

[10]  Paul H. Milenkovic,et al.  Glottal inverse filtering by joint estimation of an AR system with a linear input model , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  R. Miller Nature of the Vocal Cord Wave , 1956 .

[12]  Thierry Dutoit,et al.  Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation , 2011, Speech Commun..

[13]  I. Titze,et al.  New Evidence That Nonlinear Source-Filter Coupling Affects Harmonic Intensity and fo Stability During Instances of Harmonics Crossing Formants. , 2017, Journal of voice : official journal of the Voice Foundation.

[14]  Ingo R. Titze,et al.  Sensitivity of Source–Filter Interaction to Specific Vocal Tract Shapes , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Gastón Schlotthauer,et al.  Modeling and joint estimation of glottal source and vocal tract filter by state-space methods , 2017, Biomed. Signal Process. Control..

[16]  I R Titze,et al.  Three-dimensional vocal tract imaging and formant structure: varying vocal register, pitch, and loudness. , 2001, The Journal of the Acoustical Society of America.

[17]  Ingo Titze,et al.  A four-parameter model of the glottis and vocal fold contact area , 1989, Speech Commun..

[18]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[19]  I. Titze The physics of small-amplitude oscillation of the vocal folds. , 1988, The Journal of the Acoustical Society of America.

[20]  I. Titze Parameterization of the glottal area, glottal flow, and vocal fold contact area. , 1984, The Journal of the Acoustical Society of America.

[21]  Paavo Alku,et al.  Quadratic Programming Approach to Glottal Inverse Filtering by Joint Norm-1 and Norm-2 Optimization , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Paavo Alku,et al.  Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  J. Švec,et al.  Vocal dose measures: quantifying accumulated vibration exposure in vocal fold tissues. , 2003, Journal of speech, language, and hearing research : JSLHR.

[24]  Ingo R Titze,et al.  Modeling source-filter interaction in belting and high-pitched operatic male singing. , 2009, The Journal of the Acoustical Society of America.

[25]  S. Zahorian,et al.  Nonlinear inverse filtering technique for estimating the glottal-area waveform. , 1977, The Journal of the Acoustical Society of America.

[26]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[27]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[28]  Paavo Alku,et al.  Estimation of the glottal flow from speech pressure signals: Evaluation of three variants of iterative adaptive inverse filtering using computational physical modelling of voice production , 2018, Speech Commun..

[29]  H. Strube,et al.  SIM--simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals. , 2001, The Journal of the Acoustical Society of America.

[30]  Jacqueline Walker,et al.  A Review of Glottal Waveform Analysis , 2005, WNSP.

[31]  H. K. Schutte,et al.  The Efficiency of Voice Production , 1992 .

[32]  Zhaoyan Zhang Mechanics of human voice production and control. , 2016, The Journal of the Acoustical Society of America.

[33]  Paavo Alku,et al.  OPENGLOT - An open environment for the evaluation of glottal inverse filtering , 2019, Speech Commun..

[34]  Maria A. Berezina,et al.  Autoregressive modeling of voiced speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  I. Titze,et al.  Estimation of Source-Filter Interaction Regions Based on Electroglottography. , 2019, Journal of voice : official journal of the Voice Foundation.

[36]  Mia Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[37]  I. Titze,et al.  Radiation efficiency for long-range vocal communication in mammals and birds. , 2018, The Journal of the Acoustical Society of America.

[38]  Christophe d'Alessandro,et al.  Zeros of Z-transform representation with application to source-filter separation in speech , 2005, IEEE Signal Processing Letters.

[39]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.

[40]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .