Tuning of vocal tract model parameters for nasals using sensitivity functions.

Determining the cross-sectional areas of the vocal tract models from the linear predictive coding or autoregressive-moving-average analysis of speech signals from vowels has been of research interest for several decades now. To tune the shape of the vocal tract to given sets of formant frequencies, iterative methods using sensitivity functions have been developed. In this paper, the idea of sensitivity functions is expanded to a three-tube model used in connection with nasals, and the energy-based sensitivity function is compared with a Jacobian-based sensitivity function for the branched-tube model. It is shown that the difference between both functions is negligible if the sensitivity is taken with respect to the formant frequency only. Results for an iterative tuning a three-tube vocal tract model based on the sensitivity functions for a nasal (/m/) are given. It is shown that besides the polar angle, the absolute value of the poles and zeros of the rational transfer function also needs to be considered in the tuning process. To test the effectiveness of the iterative solver, the steepest descent method is compared with the Gauss-Newton method. It is shown, that the Gauss-Newton method converges faster if a good starting value for the iteration is given.

[1]  Damián Marelli,et al.  On Pole-Zero Model Estimation Methods Minimizing a Logarithmic Criterion for Speech Analysis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Tarun Pruthi,et al.  Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. , 2007, The Journal of the Acoustical Society of America.

[3]  J. J. Rodriguez,et al.  Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Brad H Story,et al.  Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions. , 2006, The Journal of the Acoustical Society of America.

[5]  Pierre Badin,et al.  Algorithm for calculating the cross-section areas of the vocal tract , 2005 .

[6]  V. Sorokin,et al.  Resonances of a branched vocal tract with compliant walls , 2004 .

[7]  René Carré,et al.  From an acoustic tube to speech production , 2004, Speech Commun..

[8]  J. Dang,et al.  Acoustic characteristics of the human paranasal sinuses derived from transmission characteristic measurement and morphological observation. , 1996, The Journal of the Acoustical Society of America.

[9]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[10]  Byeong Gi Lee,et al.  Lossy pole-zero modeling for speech signals , 1996, IEEE Trans. Speech Audio Process..

[11]  Byeong Gi Lee,et al.  Lossless pole-zero modeling of speech signals , 1993, IEEE Trans. Speech Audio Process..

[12]  Mohamad Mrayati,et al.  Distinctive regions and modes: a new theory of speech production , 1988, Speech Commun..

[13]  Richard H. Byrd,et al.  A Trust Region Algorithm for Nonlinearly Constrained Optimization , 1987 .

[14]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[15]  J L Flanagan,et al.  Voices of men and machines. , 1972, The Journal of the Acoustical Society of America.

[16]  Pierre Badin,et al.  One-dimensional and three-dimensional propagation analyses of acoustic characteristics of Japanese and French vowel /a/ with nasal coupling , 2014 .

[17]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[18]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .