Control Concepts for ArticulatorySpeech Synthesis

We present two concepts for the generation of gestural scores to control an articulatory speech synthesizer. Gestural scores are the common input to the synthesizer and constitute an organized pattern of articulatory gestures. The first concept generates the gestures for an utteranceusing the phonetic transcriptions, phone durations, and intonation commands predicted by the Bonn Open Synthesis System (BOSS) from an arbitrary input text. This conceptextends the synthesizerto a text-to-speech synthesis system. The idea of the second concept is to use timing informationextractedfrom ElectromagneticArticulography signals to generate the articulatory gestures. Therefore, it is a concept for the re-synthesis of natural utterances. Finally, application prospects for the presented synthesizerare discussed.

[1]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[2]  Robert I. Damper,et al.  Prospects for articulatory synthesis: A position paper , 2001, SSW.

[3]  Peter Birkholz,et al.  Simulation of Losses Due to Turbulence in the Time-Varying Vocal System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Peter Birkholz,et al.  Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system , 2004, INTERSPEECH.

[5]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[6]  P. Hoole,et al.  Articulatory analysis of the German vowel system , 2002 .

[7]  Peter Birkholz,et al.  Construction And Control Of A Three-Dimensional Vocal Tract Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  P. Birkholz,et al.  Vocal Tract Model Adaptation Using Magnetic Resonance Imaging , 2006 .

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  I. Titze Parameterization of the glottal area, glottal flow, and vocal fold contact area. , 1984, The Journal of the Acoustical Society of America.

[11]  Peter Birkholz,et al.  Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets , 2007, INTERSPEECH.

[12]  Sascha Fagel Audiovisuelle Sprachsynthese: Systementwicklung und -bewertung , 2004 .

[13]  Bert Cranen,et al.  Modeling a leaky glottis. , 1992 .

[14]  Peter Birkholz,et al.  3D-artikulatorische Sprachsynthese , 2005 .

[15]  Fang Liu,et al.  Tonal alignment, syllable structure and coarticulation: Toward an integrated model , 2006 .

[16]  Bernd J. Kröger Ein phonetisches Modell der Sprachproduktion , 1998 .