Emotion recognition and synthesis system on speech

A system that is capable of both recognizing and synthesising emotional content in speech is developed. First, the information that relates the physical features of emotional speech to the emotional content perceived by the listeners is estimated through linear statistical methods and it is applied to the system. The system realises emotion recognition and synthesis by means of a simple linear operation using the relation information. In the system, the pitch contour is expressed by the seven-parameter model proposed by Hirose, Fujisaki & Yamaguchi (1984), the power envelope is approximated by five line segments (11 parameters), and PSOLA (Pitch-Synchronous OverLAp) is applied to synthesise the speech. A set of emotional words, among which there is very little correlation, was selected from the preliminary statistical experiments. The relation information was verified as being significant and, from the results of the experiments, the system was able to recognise and synthesise emotional content in speech just as human subjects did.

[1]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[2]  Grant Fairbanks,et al.  Recent Experimental Investigations of Vocal Pitch in Speech , 1940 .

[3]  E. Kramer Judgment of personal characteristics and emotions from nonverbal properties of speech. , 1963, Psychological bulletin.

[4]  R. Dawes,et al.  A Proximity Analysis of Vocally Expressed Emotion , 1966 .

[5]  J. M. Cowan Pitch and intensity characteristics of stage speech , 1936 .

[6]  Keikichi Hirose,et al.  Synthesis by rule of voice fundamental frequency contours of spoken Japanese from linguistic information , 1984, ICASSP.

[7]  Joel R. Davitz,et al.  CORRELATES OF ACCURACY IN THE COMMUNICATION OF FEELINGS , 1959 .

[8]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[9]  L. Streeter,et al.  Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[10]  William Lord,et al.  Speech Pitch Frequency as an Emotional State Indicator , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Ebrahim Mamdani,et al.  Applications of fuzzy algorithms for control of a simple dynamic plant , 1974 .

[12]  A. C. Rencher,et al.  Fifty-four voices from two: the effects of simultaneous manipulations of rate, mean fundamental frequency, and variance of fundamental frequency on ratings of personality from speech. , 1974, The Journal of the Acoustical Society of America.

[13]  Hideo Saito,et al.  Evaluation of the relationship between emotional concepts and emotional parameters on speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[15]  Sheldon B. Michaels,et al.  Some Aspects of Fundamental Frequency and Envelope Amplitude as Related to the Emotional Content of Speech , 1962 .