Emotion in user interface, voice interaction system

An approach towards a personalized voice-emotion user interface regardless of the speaker's age, sex or language is presented. An extensive set of carefully chosen utterances provided a speech database for investing acoustic similarities among eight emotional states: (unemotional) neutral, anger, sadness, happiness, disgust, surprised, stressed/troubled and scared. Based on those results, a voice interaction system (VIS) capable of sensing the user's emotional message was developed. In efforts to detect emotions, several primary parameters from human speech were analyzed: pitch, formants, tempo (rhythm) and power of human voice. First the individual basic speaker's voice characteristics were extracted (pitch or/and formants in neutral speech, normal speech rate, neutral speech power) and based on those parameters the emotional message of the subject's utterance was successfully extracted. The VIS interacts with the user while changing its response according to the user's utterances.

[1]  M. E. Jernigan,et al.  Nonlinear multiplicative cepstral analysis for pitch extraction in speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hugo Fastl,et al.  Psychoacoustics Facts and Models. 2nd updated edition , 1999 .

[3]  Satoshi Kobayashi,et al.  Tempo estimation by wave envelope for recognition of paralinguistic features in spontaneous speech , 1994, ICSLP.

[4]  J. Picone,et al.  Robust pitch determination via SVD based cepstral methods , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Barbara Mones-Hattal,et al.  Facial animation: past, present and future (panel). , 1997, International Conference on Computer Graphics and Interactive Techniques.

[6]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, Creativity & Cognition.

[7]  P. H. Lindsay,et al.  Human Information Processing: An Introduction to Psychology , 1972 .

[8]  Lee Sproull,et al.  Using a human face in an interface , 1994, CHI '94.

[9]  D. Ladd,et al.  Constant "segmental anchoring" of F0 movements under changes in speech rate. , 1999, The Journal of the Acoustical Society of America.

[10]  T. Sejnowski,et al.  Measuring facial expressions by computer image analysis. , 1999, Psychophysiology.

[11]  Gary S. Katz,et al.  Bimodal expression of emotion by face and voice , 1998, MULTIMEDIA '98.

[12]  Hema A. Murthy,et al.  Pitch extraction from root cepstrum , 1994, ICSLP.

[13]  R B Reilly,et al.  Adaptive noncontact gesture-based system for augmentative communication. , 1999, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[14]  Andrew Stern,et al.  Panel on affect and emotion in the user interface , 1998, IUI '98.