Kinematic formant-to-area mapping

Abstract This article presents a method of formant-to-area mapping consisting of the direct calculation of the time derivatives of the cross-sections and length of a vocal tract model so that the time derivatives of the observed formant frequencies and the model's eigenfrequencies match. The vocal tract model is a concatenation of uniform tubelets whose cross-section areas and lengths can vary in time. Time derivatives of the tubelet parameters are obtained by solving a linear algebraic system of equations. The derivatives are then numerically integrated to arrive at cross-section and length movements. Since more than one area function is compatible with the observed formant frequencies, pseudo-energy constraints are made use of to determine a unique solution. The results show that the formant-matched movements of the tubelet cross-sections and lengths are smooth, and that the agreement between the observed and model-generated formant frequencies is better than 0.01 Hz.

[1]  Jean Schoentgen,et al.  Time series analysis of jitter , 1995 .

[2]  Katsuhiko Shirai,et al.  Estimation and generation of articulatory motion using neural networks , 1993, Speech Commun..

[3]  P. Ladefoged,et al.  Generating vocal tract shapes from formant frequencies. , 1978, The Journal of the Acoustical Society of America.

[4]  Donald G. Childers,et al.  Optimization of acoustic-to-articulatory mapping , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  M M Sondhi Resonances of a bent vocal tract. , 1986, The Journal of the Acoustical Society of America.

[6]  Katsuhiko Shirai,et al.  Estimating articulatory motion from speech wave , 1986, Speech Commun..

[7]  Marco Saerens,et al.  Acoustic-articulatory inversion based on a neural controller of a vocal tract model , 1990, SSW.

[8]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[9]  K. Shirai,et al.  Estimation of articulatory motion using neural networks , 1991 .

[10]  Olivier Rioul,et al.  Neural networks for estimating articulatory positions from speech , 1989 .

[11]  C. C. Goodyear,et al.  On the use of neural networks in articulatory speech synthesis , 1993 .

[12]  Jean Schoentgen,et al.  Experimental study of the target theory of vowel production , 1995, EUROSPEECH.

[13]  Sarangarajan Parthasarathy,et al.  Evaluation of improved articulatory codebooks and codebook access distance measures , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[15]  Mohamad Mrayati,et al.  Distinctive regions and modes: a new theory of speech production , 1988, Speech Commun..

[16]  M. Sondhi Model for wave propagation in a lossy vocal tract. , 1974, The Journal of the Acoustical Society of America.

[17]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[18]  J. Flanagan,et al.  Signal models for low bit‐rate coding of speech , 1980 .

[19]  René Carré,et al.  Distinctive regions in acoustic tubes. Speech production modelling , 1992 .

[20]  John Nicholas Holmes,et al.  Speech synthesis , 1972 .

[21]  W. Bastiaan Kleijn,et al.  Acoustic to articulatory parameter mapping using an assembly of neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Pierre Badin,et al.  Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: A new model for vowels and fricative consonants based on experimental data , 1995, Speech Commun..

[23]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[24]  Qiguang Lin,et al.  Vocal-tract area-function parameters from formant frequencies , 1989, EUROSPEECH.

[25]  Gérard Bailly,et al.  Motor Control for Speech Skills: a Connectionist Approach , 1991 .

[26]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[27]  Cours de Mathématiques supérieures , 1904 .

[28]  Gunnar Fant,et al.  Vocal tract area functions of Swedish vowels and a new three-parameter model , 1992, ICSLP.

[29]  Rafael Laboissière Préliminaires pour une robotique de la communication parlée : inversion et contrôle d'un modèle articulatoire du conduit vocal , 1992 .

[30]  Mazin G. Rahim,et al.  Artificial Neural Networks for Speech Analysis/Synthesis , 1994 .

[31]  G. Stewart Introduction to matrix computations , 1973 .

[32]  M. Schroeder Determination of the geometry of the human vocal tract by acoustic measurements. , 1967, The Journal of the Acoustical Society of America.

[33]  Francis Charpentier Determination of the vocal tract shape from the formants by analysis of the articulatory-to-acoustic nonlinearities , 1984, Speech Commun..

[34]  Jean Schoentgen,et al.  Explicit relations between resonance frequencies and vocal tract cross sections in loss-less kelly-lochbaum and distinctive region vocal tract models , 1994, ICSLP.