Analysis and modification of spectral energy for neutral to sad emotion conversion

This work explores the spectral energies of neutral, sad and angry speech, and analyzes the potential of spectral energy modification to convert neutral speech to sad/angry speech. A method of modifying the spectral energy of neutral speech signals based on a filter bank implementation is proposed for the purpose of converting a given neutral speech to a target emotional speech. Since pitch plays a vital role in emotion expression, we modify the pitch contour first by using the method of Gaussian normalization. This is followed by modification of spectral energy using a method proposed in this paper. The expressiveness of the resultant speech is compared with speech obtained by modifying only the pitch contour, and we have observed improvements in expressiveness due to incorporation of proposed spectral energy modification. The method is found to be quite good for neutral to sad conversion. However, the quality of conversion to anger is not good, and the reasons behind this are analyzed.

[1]  K. Sreenivasa Rao,et al.  Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi and Telugu , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).

[2]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[3]  Dirk Heylen,et al.  Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Aijun Li,et al.  Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  S. R. Mahadeva Prasanna,et al.  Neutral to Target Emotion Conversion Using Source and Suprasegmental Information , 2011, INTERSPEECH.