Emotion recognition from audiovisual information

We report preliminary results on emotion recognition by machine from joint audiovisual input of facial video and speech. The results show potential advantages in using both modalities over either modality alone. The recognition rate for audio alone is about 75% and video alone about 70%. Using audiovisual data we achieved 97% without increasing the number of features. The improvement in performance is accredited to the complementary property between the two modalities. A possible application is in natural human-computer interfaces.

[1]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[2]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[3]  L. de Silva,et al.  Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[4]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[5]  Rosalind W. Picard Aaective Computing , 1995 .

[6]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  P PentlandAlex,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997 .

[8]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Klaus R. Scherer,et al.  Adding the affective dimension: a new look in speech analysis and synthesis , 1996, ICSLP.

[10]  Thomas S. Huang,et al.  Connected vibrations: a modal analysis approach for non-rigid motion tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[11]  K. Shirai,et al.  Extraction of speaker's feeling using facial image and speech , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.