Speaking Effect Removal on Emotion Recognition From Facial Expressions Based on Eigenface Conversion

Speaking effect is a crucial issue that may dramatically degrade performance in emotion recognition from facial expressions. To manage this problem, an eigenface conversion-based approach is proposed to remove speaking effect on facial expressions for improving accuracy of emotion recognition. In the proposed approach, a context-dependent linear conversion function modeled by a statistical Gaussian Mixture Model (GMM) is constructed with parallel data from speaking and non-speaking facial expressions with emotions. To model the speaking effect in more detail, the conversion functions are categorized using a decision tree considering the visual temporal context of the Articulatory Attribute (AA) classes of the corresponding input speech segments. For verification of the identified quadrant of emotional expression on the Arousal-Valence (A-V) emotion plane, which is commonly used to dimensionally define the emotion classes, from the reconstructed facial feature points, an expression template is constructed to represent the feature points of the non-speaking facial expressions for each quadrant. With the verified quadrant, a regression scheme is further employed to estimate the A-V values of the facial expression as a precise point in the A-V emotion plane. Experimental results show that the proposed method outperforms current approaches and demonstrates that removing the speaking effect on facial expression is useful for improving the performance of emotion recognition.

[1]  오세영 Realtime Facial Expression Recognition using Active Appearance Model and Multilayer Perceptron , 2006 .

[2]  Hatice Gunes,et al.  Automatic, Dimensional and Continuous Emotion Recognition , 2010, Int. J. Synth. Emot..

[3]  Tetsuya Takiguchi,et al.  PCA-Based Speech Enhancement for Distorted Speech Recognition , 2007, J. Multim..

[4]  Timothy F. Cootes,et al.  Automatic tracking, coding and reconstruction of human faces, using flexible appearance models , 1994 .

[5]  Aijun Li,et al.  Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  R. Thayer The biopsychology of mood and arousal , 1989 .

[7]  Shiqing Zhang,et al.  Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding , 2012, EURASIP Journal on Advances in Signal Processing.

[8]  Chung-Hsien Wu,et al.  Pronunciation variation generation for spontaneous speech synthesis using state-based voice transformation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Rogério Schmidt Feris,et al.  Manifold Based Analysis of Facial Expression , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Léon J. M. Rothkrantz,et al.  Semantic Audiovisual Data Fusion for Automatic Emotion Recognition , 2015 .

[11]  Chung-Hsien Wu,et al.  Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[12]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[13]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mohammed Yeasin,et al.  Recognition of facial expressions and measurement of levels of interest from video , 2006, IEEE Transactions on Multimedia.

[15]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[16]  Hatice Gunes,et al.  A multi-layer hybrid framework for dimensional emotion classification , 2011, ACM Multimedia.

[17]  Ioannis Pitas,et al.  An analysis of facial expression recognition under partial facial image occlusion , 2008, Image Vis. Comput..

[18]  Zhu Yong,et al.  Expression Recognition Method of Image Sequence in Audio-video , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[19]  S. V. Dudul,et al.  Human emotion recognition system using optimally designed SVM with different facial feature extraction techniques , 2008 .

[20]  David A. van Leeuwen,et al.  Arousal and valence prediction in spontaneous emotional speech: felt versus perceived emotion , 2009, INTERSPEECH.

[21]  Vinod Chandran,et al.  Evaluation of Texture and Geometry for Dimensional Facial Expression Recognition , 2011, 2011 International Conference on Digital Image Computing: Techniques and Applications.

[22]  Qingshan Liu,et al.  Facial expression recognition using encoded dynamic features , 2007, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Jinyu Li,et al.  A study on knowledge source integration for candidate rescoring in automatic speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Timothy F. Cootes,et al.  Trainable method of parametric shape description , 1992, Image Vis. Comput..

[26]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[27]  Thomas S. Huang,et al.  Facial Expression Recognition from Video Sequences : Temporal and Static Modelling , 2002 .

[28]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Sungsoo Park,et al.  Subtle facial expression recognition using motion magnification , 2009, Pattern Recognit. Lett..

[30]  Chung-Hsien Wu,et al.  Co-articulation generation using maximum direction change and apparent motion for Chinese visual speech synthesis , 2010, 2010 International Computer Symposium (ICS2010).

[31]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[32]  Dongrui Wu,et al.  Speech emotion estimation in 3D space , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[33]  J. Russell A circumplex model of affect. , 1980 .

[34]  Franck Davoine,et al.  A solution for facial expression representation and recognition , 2002, Signal Process. Image Commun..

[35]  Zhihong Zeng,et al.  Bimodal HCI-related affect recognition , 2004, ICMI '04.

[36]  Benzai Deng,et al.  Facial Expression Recognition using AAM and Local Facial Features , 2007, Third International Conference on Natural Computation (ICNC 2007).

[37]  Muni S. Srivastava,et al.  Regression Analysis: Theory, Methods, and Applications , 1991 .

[38]  Heiga Zen,et al.  On the Use of Kernel PCA for Feature Extraction in Speech Recognition , 2003, IEICE Trans. Inf. Syst..

[39]  Chin-Hui Lee,et al.  Toward a detector-based universal phone recognizer , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[41]  Zhihong Zeng,et al.  Audio-Visual Affect Recognition , 2007, IEEE Transactions on Multimedia.