Automatic spoken affect classification and analysis

This paper reports results from preliminary experiments on automatic classification of spoken affect valence. The task was to classify short spoken sentences into one of two classes: approving or disapproving. Using an optimal combination of six acoustic measurements our classifier achieved an accuracy of 65% to 88% for speaker dependent, text-independent classification. The results suggest that pitch and energy measurements may be used to automatically classify spoken affect valence but more research will be necessary to understand individual variations and how to broaden the range of affect classes which can be recognized. In a second experiment we compared human performance in classifying the same speech samples. We found similarities between human and automatic classification results.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[3]  B. Blount,et al.  Prosodic, paralinguistic, and interactional features in parent-child speech: English and Spanish , 1977, Journal of Child Language.

[4]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[5]  K. Scherer,et al.  Minimal cues in the vocal communication of affect: Judging emotions from content-masked speech , 1972, Journal of psycholinguistic research.

[6]  Janet E. Cahn Generating expression in synthesized speech , 1989 .

[7]  Rosalind W. Picard Affective Computing , 1997 .

[8]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[9]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[10]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[11]  Jeff Pittam,et al.  The long-term spectrum and perceived emotion , 1990, Speech Commun..

[12]  L. Streeter,et al.  Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[13]  K. Stevens,et al.  Glottal characteristics of female speakers , 1995 .

[14]  Kim E. A. Silverman,et al.  Vocal cues to speaker affect: testing two models , 1984 .

[15]  W. Apple,et al.  Speaking emotionally: The relation between verbal and vocal communication of affect. , 1982 .

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.