Making Sense of Variations: Introducing Alternatives in Speech Synthesis

This paper addresses the use of speech alternatives to enrich speech synthesis systems. Speech alternatives denote the variety of strategies that a speaker can use to pronounce a sentence - depending on pragmatic constraints, speaking style, and specific strategies of the speaker. During the training, symbolic and acoustic characteristics of a unit-selection speech synthesis system are statistically modelled with context-dependent parametric models (GMMs/HMMs). During the synthesis, symbolic and acoustic alternatives are exploited using a GENERALIZED VITERBI ALGORITHM (GVA) to determine the sequence of speech units used for the synthesis. Objective and subjective evaluations supports evidence that the use of speech alternatives significantly improves speech synthesis over conventional speech synthesis systems.

[1]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Xavier Rodet,et al.  Prosodic control of unit-selection speech synthesis: A probabilistic approach , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Anne Lacheret,et al.  HMM-based prosodic structure model using rich linguistic context , 2010, INTERSPEECH.

[4]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Anne Lacheret,et al.  Reformulating Prosodic Break Model into Segmental HMMs and Information Fusion , 2011, INTERSPEECH.

[6]  Takeshi Hashimoto A list-type reduced-constraint generalization of the Viterbi algorithm , 1987, IEEE Trans. Inf. Theory.