Automatic Spoken Affect Analysis and Classification

This paper reports results from early experiments on automatic classification of spoken affect. The task was to classify short spoken sentences into one of two affect classes: approving or disapproving. Using an optimal combination of six acoustic measurements our classifier achieved an accuracy of 65% to 88% for speaker dependent, text-independent classification. The results suggest that pitch and energy measurements may be used to automatically classify spoken affect but more research will be necessary to understand individual variations and how to broaden the range of affect classes which can be recognized. In a second experiment we compared human performance in classifying the same speech samples. We found similarities between human and automatic classification results.

[1]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[2]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[3]  Kim E. A. Silverman,et al.  Vocal cues to speaker affect: testing two models , 1984 .

[4]  K. Stevens,et al.  Glottal characteristics of female speakers , 1995 .

[5]  Janet E. Cahn Generating expression in synthesized speech , 1989 .

[6]  V. Rich Personal communication , 1989, Nature.

[7]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[8]  L. Streeter,et al.  Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[9]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[10]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[12]  Jeff Pittam,et al.  The long-term spectrum and perceived emotion , 1990, Speech Commun..

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[15]  K. Scherer,et al.  Minimal cues in the vocal communication of affect: Judging emotions from content-masked speech , 1972, Journal of psycholinguistic research.