论文信息 - Emotional Speech Synthesis using Subspace Constraints in Prosody

Emotional Speech Synthesis using Subspace Constraints in Prosody

An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for "anger", "surprise", "disgust", 'sorrow", "boredom", "depression", and "joy"

[1] Michael Picheny,et al. A corpus-based approach to expressive speech synthesis , 2004, SSW.

[2] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[3] R. Bakis,et al. A CORPUS-BASED APPROACH TO < AHEM / > EXPRESSIVE SPEECH SYNTHESIS , 2004 .

[4] Takehiko Kagoshima,et al. Analytic generation of synthesis units by closed loop training for totally speaker driven text to speech system (TOS drive TTS) , 1998, ICSLP.

[5] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[6] Nick Campbell,et al. ISCA special session: hot topics in speech synthesis , 2003, INTERSPEECH.