Articulatory Speech Synthesis from the Fluid Dynamics of the Vocal Apparatus

This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter, we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics. The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations. We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation. Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The voc l folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction. Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems. Table of Contents: Introduction / Literature Review / Estimation of Dynamic Articulatory Parameters / Construction of Articulatory Model Based on MRI Data / Vocal Fold Excitation Models / Experimental Results of Articulatory Synthesis / Conclusion

[1]  M. G. Rahim,et al.  Articulatory synthesis with the aid of a neural net , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[3]  A. Liberman,et al.  Minimal Rules for Synthesizing Speech , 1959 .

[4]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[5]  Mazin G. Rahim,et al.  Estimation of vocal tract filter parameters using a neural net , 1990, Speech Commun..

[6]  C.H. Coker,et al.  A model of articulatory dynamics and control , 1976, Proceedings of the IEEE.

[7]  van Rr René Hassel,et al.  Theoretical and experimental study of quasisteady‐flow separation within the glottis during phonation. Application to a modified two‐mass model , 1994 .

[8]  Man Mohan Sondhi,et al.  A hybrid time-frequency domain articulatory speech synthesizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[9]  Shinji Maeda,et al.  A digital simulation method of the vocal-tract system , 1982, Speech Commun..

[10]  D. Berry,et al.  Normal modes in a continuum model of vocal fold tissues. , 1996, The Journal of the Acoustical Society of America.

[11]  J. N. Holmes,et al.  Formant synthesizers: Cascade or parallel? , 1983, Speech Commun..

[12]  Michael Unser,et al.  On the approximation power of convolution-based least squares versus interpolation , 1997, IEEE Trans. Signal Process..

[13]  I. Titze,et al.  Voice simulation with a body-cover model of the vocal folds. , 1995, The Journal of the Acoustical Society of America.

[14]  J. Holmes,et al.  The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer , 1973 .

[15]  Alan W. Black,et al.  CHATR: a generic speech synthesis system , 1994, COLING.

[16]  S E Levinson,et al.  Adaptive computation of articulatory parameters from the speech signal. , 1982, The Journal of the Acoustical Society of America.

[17]  A. Gray,et al.  On autocorrelation equations as applied to speech analysis , 1973 .

[18]  M. S. Howe Acoustics of Fluid–Structure Interactions: Index , 1998 .

[19]  Richard A. Harshman,et al.  Factor analysis of tongue shapes. , 1971, The Journal of the Acoustical Society of America.

[20]  W. L. Nelson Physical principles for economies of skilled movements , 1983, Biological Cybernetics.

[21]  Qiguang Lin,et al.  Glottal source‐vocal tract acoustic interaction , 1987 .

[22]  Juergen Luettin,et al.  Extraction of articulators in x-ray image sequences , 1999, EUROSPEECH.

[23]  Yannis Stylianou Concatenative speech synthesis using a harmonic plus noise model , 1998, SSW.

[24]  Alexander S. Leonov,et al.  Estimation of stability and accuracy of inverse problem solution for the vocal tract , 2000, Speech Commun..

[25]  L. Rabiner,et al.  Isolated and Connected Word Recognition - Theory and Selected Applications , 1981, IEEE Transactions on Communications.

[26]  A. Marchal,et al.  Regenerating the spectral shapes of [s] and [∫] from a limited set of articulatory parameters , 1994 .

[27]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[28]  Michael S. Howe,et al.  The generation of sound by aerodynamic sources in an inhomogeneous steady flow , 1975, Journal of Fluid Mechanics.

[29]  Homer Dudley,et al.  A Synthetic Speaker , 1939, Science.

[30]  H. Strube,et al.  A quasiarticulatory speech synthesizer for German language running in real time , 1989 .

[31]  Thomas Baer,et al.  An articulatory synthesizer for perceptual research , 1978 .

[32]  T. Koizumi,et al.  Two-mass models of the vocal cords for natural sounding voice synthesis. , 1987, The Journal of the Acoustical Society of America.

[33]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[34]  Thierry Blu,et al.  Quantitative Fourier Analysis of Approximation Techniques : Part I — Interpolators and Projectors , 1999 .

[35]  D. Berry,et al.  Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. , 1994, The Journal of the Acoustical Society of America.

[36]  M. Krane Aeroacoustic production of low-frequency unvoiced speech sounds. , 2005, The Journal of the Acoustical Society of America.

[37]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.

[38]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[39]  Alistair Conkie A robust unit selection system for speech synthesis , 1999 .

[40]  J. L. Flanagan,et al.  Synthesis of speech from a dynamic model of the vocal cords and vocal tract , 1975, The Bell System Technical Journal.

[41]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[42]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[43]  John L. Kelly,et al.  An Artificial Talker Driven from a Phonetic Input , 1961 .

[44]  Victor N. Sorokin,et al.  Determination of vocal tract shape for vowels , 1992, Speech Commun..

[45]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[46]  J. Holmes,et al.  Speech Synthesis by Rule , 1964 .

[47]  W. J. Holmes,et al.  Extension of the bandwidth of the JSRU parallel-formant synthesizer for high quality synthesis of male and female speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[48]  Yannis Stylianou Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[49]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.