Acoustic nature and perceptual testing of corpora of emotional speech
暂无分享,去创建一个
This paper proposes three corpora of emotional speech in Japanese that maximize the expression of each emotion (expressing joy, anger, and sadness) for use with CHATR, the concatenative speech synthesis system being developed at ATR. A perceptual experiment was conducted using the synthesized speech generated from each emotion corpus and the results proved to be significantly identifiable. Authors’ current work is to identify the local acoustic features relevant for specifying a particular emotion type. F0 and duration showed significant differences among emotion types. AV (amplitude of voicing source) and GN (glottal noise) also showed differences. This paper reports on the corpus design, the perceptual experiment, and the results of the acoustic analysis.
[1] Yoshinori Kitahara,et al. Prosodic components of speech in the expression of emotions , 1988 .
[2] Iain R. Murray,et al. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.