Improvement of Thai speech emotion recognition by using face feature analysis

The fact that in Thai language emotions are not usually manifested, mostly because any emotion would interfere with meaning otherwise, makes this language very difficult for any kind of emotion recognition. Our proposed Thai emotion recognition system consists of two parts - speech emotion recognition and improvements of the system using face feature analysis. For this purpose audiovisual Thai emotion database was recorded. Speech emotion recognition is based on calculating fundamental frequency, zero crossing rate and energy from short-time wavelet signals, and shows great system results with accuracy of 97.8%. Our current research activities are directed to improving the accuracy of the overall system using face feature analysis, therefore showing that vision is as crucial as hearing is for expressing and recognizing any emotion.

[1]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Vinay Bettadapura,et al.  Face Expression Recognition and Analysis: The State of the Art , 2012, ArXiv.

[3]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[4]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[5]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[7]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[9]  Mary R. Haas,et al.  THAI-ENGLISH, STUDENT'S DICTIONARY. , 1964 .

[10]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[11]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[14]  Eirik Gustavsen,et al.  Classifying motion picture audio , 2007 .

[15]  V. Petrushin Emotion Recognition Agents in Real World , 2000 .

[16]  K. N. Dollman,et al.  - 1 , 1743 .

[17]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[18]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[19]  Yanqiong Peng,et al.  Quantitative tests of interaction between pollinating and non‐pollinating fig wasps on dioecious Ficus hispida , 2005 .

[20]  Bernard Comrie,et al.  The World's Major Languages , 1987 .

[21]  J.-L. Dhondt Out of memory , 1998 .

[22]  Nicu Sebe,et al.  Authentic facial expression analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[23]  Fadi Dornaika,et al.  Simultaneous Facial Action Tracking and Expression Recognition in the Presence of Head Motion , 2008, International Journal of Computer Vision.

[24]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[25]  C. H. Chen,et al.  Signal processing handbook , 1988 .

[26]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[28]  Jonathan Klein,et al.  This computer responds to user frustration: Theory, design, and results , 2002, Interact. Comput..

[29]  Elisabeth André,et al.  Exploring the benefits of discretization of acoustic features for speech emotion recognition , 2009, INTERSPEECH.

[30]  S. Jovi Serbian emotional speech database : design , processing and evaluation , 2004 .

[31]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[32]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[33]  Margot J. Taylor,et al.  Early processing of the six basic facial emotional expressions. , 2003, Brain research. Cognitive brain research.

[34]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[35]  John R. Buck,et al.  A Generalized Linear Filter Approach for Sonar Receivers , 2009, 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop.

[36]  Rosalind W. Picard Affective Computing , 1997 .

[37]  Yibin Zhang,et al.  A study on content-based music classification , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[38]  Xiaobo Tan,et al.  A novel adaptive IIR notch filter for frequency estimation and tracking , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[39]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[40]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, MULTIMEDIA '99.

[41]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[42]  Hitoshi Isahara,et al.  ORCHID: Thai Part-Of-Speech Tagged Corpus , 2009 .

[43]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[44]  Hanchuan Peng,et al.  Automatic recognition and annotation of gene expression patterns of fly embryos , 2007, Bioinform..

[45]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[46]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[48]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.