论文信息 - Joint-processing of audio-visual signals in human perception of conflicting synthetic character emotions

Joint-processing of audio-visual signals in human perception of conflicting synthetic character emotions

Expressive audio-visual synthetic characters are increasingly employed in research and commercial applications. However, the mechanism that people employ to interpret conflicting or uncertain multimodal emotional displays of these agents is not yet well understood. This study is an attempt to provide a better understanding of the interpretation of conflicting expressive displays in video and audio channels through the use of a continuous dimensional evaluation framework of emotional valence, activation, and dominance. The results indicate that when two conflicting emotions are presented to subjects using audio and video channels, the means of the dimensional evaluations of the resulting emotional judgments by the subjects is located in between the audio-only and video-only emotion perceptual centers. Furthermore, the deviation from the audio-only center is proportional to the distance between the audio and video centers. This indicates that the perceptual judgment of conflicting emotions involves the joint processing of both the audio and the video information irrespective of the perceptual bias toward the audio channel. In general the amount of interaction between audio and video channel seems proportional to the emotional disparity of the two channels in the continuous emotional space considered in this study.

Maja J. Mataric | Shrikanth S. Narayanan | Emily Mower Provost | Sungbok Lee

[1] Shrikanth S. Narayanan,et al. An articulatory study of emotional speech production , 2005, INTERSPEECH.

[2] Jyrki Tuomainen,et al. The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses , 1999, Neuroscience Letters.

[3] Maja J. Mataric,et al. Human perception of synthetic character emotions in the presence of conflicting and congruent vocal and facial expressions , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] J. Hietanen,et al. Evidence for the integration of audiovisual emotional information at the perceptual level of processing , 2004 .

[5] Elaine Chew,et al. Multiple Regression Modeling of the Emotional Content of Film and Music , 2007 .

[6] Yonghong Yan,et al. Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[7] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.

[8] J. Vroomen,et al. The perception of emotions by ear and by eye , 2000 .

[9] Zhigang Deng,et al. Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[10] D. Massaro,et al. Fuzzy logical model of bimodal emotion perception: Comment on “The perception of emotions by ear and by eye” by de Gelder and Vroomen , 2000 .