Audio-visual based emotion recognition - a new approach

Emotion recognition is one of the latest challenges in intelligent human/computer communication. Most of the previous work on emotion recognition focused on extracting emotions from visual or audio information separately. A novel approach is presented in this paper, including both visual and audio from video clips, to recognize the human emotion. The facial animation parameters (FAPs) compliant facial feature tracking based on active appearance model is performed on the video to generate two vector stream which represent the expression feature and the visual speech one. Combined with the visual vectors, the audio vector is extracted in terms of low level features. Then, a tripled hidden Markov model is introduced to perform the recognition which allows the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. The experimental results show that this approach outperforms only using visual or audio separately.

[1]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[2]  Montse Pardàs,et al.  Emotion recognition based on MPEG-4 Facial Animation Parameters , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[5]  Thomas S. Huang,et al.  Facial Expression Recognition from Video Sequences : Temporal and Static Modelling , 2002 .

[6]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[7]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[10]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[11]  Samy Bengio,et al.  An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition , 2002, NIPS.

[12]  Takeo Kanade,et al.  Subtly different facial expression recognition and expression intensity estimation , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[14]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Alan Fern,et al.  Expressionism and Emotion in American Painting , 1954 .