A mixed inventory structure for German concatenative synthesis

In speech synthesis by unit concatenation a major point is the definition of the unit inventory. Diphone or demisyliable inventories are widely used but both unit types have their drawbacks. This chapter describes a mixed inventory structure that is syllable-oriented but does not demand a definite decision about the position of a syllable boundary. In the definition process of the inventory the results of a comprehensive investigation of coarticulatory phenomena at syllable boundaries were used as well as a machine-readable pronunciation dictionary. An evaluation comparing the mixed inventory with a demisyllable and a diphone inventory confirms that speech generated with the mixed inventory is superior regarding general acceptance. A segmental intelligibility test shows the high intelligibility of the synthetic speech.

[1]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2]  Olivier Boëffard,et al.  Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based, multilingual PSOLA text-to-speech systems , 1993, EUROSPEECH.

[3]  Victor J. Boucher,et al.  A parameter of syllabification for VstopV and relative timing invari-ance , 1988 .

[4]  Joseph P. Olive A new algorithm for a concatenative speech synthesis system using an augmented acoustic inventory of speech sounds , 1990, SSW.

[5]  D. Whalen Coarticulation is largely planned , 1990 .

[6]  P. Keating PHONETIC AND PHONOLOGICAL REPRESENTATION OF STOP CONSONANT VOICING , 1984 .

[7]  Volker Kraft Does the resulting speech quality improvement make a sophisticated concatenation of time-domain synthesis units worthwhile? , 1994, SSW.

[8]  V. Tartter,et al.  Which syllable does an intervocalic stop belong to? A selective adaptation study. , 1984, Journal of the Acoustical Society of America.

[9]  Rolf Carlson,et al.  Segmental evaluation using the ESPRIT/SAM test procedures and monosyllabic words , 1990, SSW.

[10]  O. Fujimura,et al.  An Analysis of English Syllables as Cores and Affixes , 1979 .

[11]  Klaus J. Kohler,et al.  Segmental Reduction in Connected Speech in German: Phonological Facts and Phonetic Explanations , 1990 .

[12]  Gordon E. Peterson,et al.  Objectives and Techniques of Speech Synthesis , 1960 .

[13]  Rpg Rene Collier,et al.  On the combined use of accented and unaccented diphones in speech synthesis , 1991 .

[14]  Carl Martens,et al.  Phonetik der deutschen Sprache : praktische Aussprachelehre , 1961 .

[15]  Stephen Isard,et al.  Optimal coupling of diphones , 1994, SSW.

[16]  K. Kohler Einführung in die Phonetik des Deutschen , 1981 .

[17]  Walter F. Sendlmeier,et al.  HADIFIX - a speech synthesis system for German , 1992, ICSLP.