Joint-processing of audio-visual signals in human perception of conflicting synthetic character emotions

Expressive audio-visual synthetic characters are increasingly employed in research and commercial applications. However, the mechanism that people employ to interpret conflicting or uncertain multimodal emotional displays of these agents is not yet well understood. This study is an attempt to provide a better understanding of the interpretation of conflicting expressive displays in video and audio channels through the use of a continuous dimensional evaluation framework of emotional valence, activation, and dominance. The results indicate that when two conflicting emotions are presented to subjects using audio and video channels, the means of the dimensional evaluations of the resulting emotional judgments by the subjects is located in between the audio-only and video-only emotion perceptual centers. Furthermore, the deviation from the audio-only center is proportional to the distance between the audio and video centers. This indicates that the perceptual judgment of conflicting emotions involves the joint processing of both the audio and the video information irrespective of the perceptual bias toward the audio channel. In general the amount of interaction between audio and video channel seems proportional to the emotional disparity of the two channels in the continuous emotional space considered in this study.