Study on Emotional Speech Features in Korean with Its Application to Voice Conversion

Recent researches in speech synthesis are mainly focused on naturalness, and the emotional speech synthesis becomes one of the highlighted research topics. Although quite a many studies on emotional speech in English or Japanese have been addressed, the studies in Korean can seldom be found. This paper presents an analysis of emotional speech in Korean. Emotional speech features related to human speech prosody, such as F0, the duration, and the amplitude with their variations, are exploited. Their attribution to three different types of typical human speech is tried to be quantified and modeled. By utilizing the analysis results, emotional voice conversion from the neutral speech to the emotional one is also performed and tested.

[1]  Shigeo Morishima,et al.  Emotion modeling in speech production using emotion space , 1996, Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA.

[2]  Erhard Rank,et al.  Generating emotional speech with a concatenative synthesizer , 1998, ICSLP.

[3]  Dimitrios Galanis,et al.  Investigating emotional speech parameters for speech synthesis , 1996, Proceedings of Third International Conference on Electronics, Circuits, and Systems.

[4]  John L. Arnott,et al.  Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..

[5]  Hirotaka Suzuki,et al.  Prosodic parameters in emotional speech , 1998, ICSLP.

[6]  Reza Sahandi,et al.  Synthesis of emotional speech using RP-PSOLA , 2000 .

[7]  Tsuyoshi Moriyama,et al.  Emotion recognition and synthesis system on speech , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.