A glottal chink model for the synthesis of voiced fricatives

This paper presents a simulation framework that enables a glottal chink model to be integrated into a time-domain continuous speech synthesizer along with self-oscillating vocal folds. The glottis is then made up of two main separated components: a self-oscillating part and a constantly open chink. This feature allows the simulation of voiced fricatives, thanks to a self-oscillating model of the vocal folds to generate the voiced source, and the glottal opening that is necessary to generate the frication noise. Numerical simulations show the accuracy of the model to simulate voiced fricative, and also phonetic assimilation, such as sonorization and devoicing. The simulation framework is also used to show that the phonatory/articulatory space for generating voiced fricatives is different according to the desired sound: for instance, the minimal glottal opening for generating frication noise is shorter for /z/ than for /3/.

[1]  Tatsuya Kitamura,et al.  Single-matrix formulation of a time domain acoustic model of the vocal tract with side branches , 2008, Speech Commun..

[2]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[4]  Brad H. Story,et al.  Phrase-level speech simulation with an airway modulation model of speech production , 2013, Comput. Speech Lang..

[5]  Peter Birkholz,et al.  Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system , 2004, INTERSPEECH.

[6]  Yves Laprie,et al.  Extension of the single-matrix formulation of the vocal tract: Consideration of bilateral channels and connection of self-oscillating models of the vocal folds with a glottal chink , 2016, Speech Commun..

[7]  Man Mohan Sondhi,et al.  A hybrid time-frequency domain articulatory speech synthesizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  H. Strube,et al.  A quasiarticulatory speech synthesizer for German language running in real time , 1989 .

[9]  Peter Birkholz,et al.  Simulation of Losses Due to Turbulence in the Time-Varying Vocal System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Shinji Maeda,et al.  A digital simulation method of the vocal-tract system , 1982, Speech Commun..

[11]  van Rr René Hassel,et al.  Theoretical and experimental study of quasisteady‐flow separation within the glottis during phonation. Application to a modified two‐mass model , 1994 .

[12]  Xavier Pelorson,et al.  Influence of a constriction in the near field of the vocal folds: physical modeling and experimental validation. , 2008, The Journal of the Acoustical Society of America.

[13]  Yves Laprie,et al.  Articulatory copy synthesis from cine x-ray films , 2013, INTERSPEECH.

[14]  Coriandre Vilain,et al.  Experimental validation of a quasi-steady theory for the flow through the glottis , 2004 .

[15]  John Nicholas Holmes,et al.  Speech synthesis , 1972 .

[16]  ElieBenjamin,et al.  Extension of the single-matrix formulation of the vocal tract , 2016 .

[17]  Ritu Sharma Speech Synthesis , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[18]  Didier Demolin,et al.  Mid-sagittal cut to area function transformations: Direct measurements of mid-sagittal distance and area with MRI , 2002, Speech Commun..

[19]  Bert Cranen,et al.  Physiologically motivated modelling of the voice source in articulatory analysis/synthesis , 1993, Speech Commun..

[20]  Bert Cranen,et al.  Modeling a leaky glottis. , 1992 .