Gender-Driven Emotion Recognition Through Speech Signals For Ambient Intelligence Applications

This paper proposes a system that allows recognizing a person's emotional state starting from audio signal registrations. The provided solution is aimed at improving the interaction among humans and computers, thus allowing effective human-computer intelligent interaction. The system is able to recognize six emotions(anger, boredom, disgust, fear, happiness, and sadness) and the neutral state. This set of emotional states is widely used for emotion recognition purposes. It also distinguishes a single emotion versus all the other possible ones, as proven in the proposed numerical results. The system is composed of two subsystems: 1) gender recognition(GR) and 2) emotion recognition(ER). The experimental analysis shows the performance in terms of accuracy of the proposed ER system. The results highlight that the a priori knowledge of the speaker's gender allows a performance increase. The obtained results show also that the features selection adoption assures a satisfying recognition rate and allows reducing the employed features. Future developments of the proposed solution may include the implementation of this system over mobile devices such as smartphones.

[1]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  L. Zhu,et al.  Towards Mood Based Mobile Services and Applications , 2007, EuroSSC.

[3]  Constantine Kotropoulos,et al.  Gender classification in two Emotional Speech databases , 2008, 2008 19th International Conference on Pattern Recognition.

[4]  Michael,et al.  Affective Computing and Intelligent Interaction , 2011, Lecture Notes in Computer Science.

[5]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[6]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[7]  Peter Cheeseman,et al.  Bayesian classification theory , 1991 .

[8]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[9]  R. Stibbard AUTOMATED EXTRACTION OF ToBI ANNOTATION DATA FROM THE READING / LEEDS EMOTIONAL SPEECH CORPUS , 2000 .

[10]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[11]  Buket D. Barkana,et al.  Energy Estimation between Adjacent Formant Frequencies to Identify Speaker's Gender , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[12]  Florian Metze,et al.  Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[14]  R.D.R. Fagundes,et al.  Automatic gender identification by speech signal using eigenfiltering based on Hebbian learning , 2002, VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings..

[15]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[16]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[17]  David Gerhard,et al.  Pitch Extraction and Fundamental Frequency: History and Current Techniques , 2003 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  David Barber,et al.  Generative vector quantisation , 1999 .

[20]  Daniel Gatica-Perez,et al.  Who's Who with Big-Five: Analyzing and Classifying Personality Traits with Smartphones , 2011, 2011 15th Annual International Symposium on Wearable Computers.

[21]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[22]  Rosalind W. Picard Affective Computing , 1997 .

[23]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[24]  W. Rogers,et al.  A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[25]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[27]  G. Tzanetakis Audio-based gender identification using bootstrapping , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[28]  Yi Li,et al.  Speaker gender identification based on combining linear and nonlinear features , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[29]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[30]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[31]  Zhen-Yang Wu,et al.  Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[32]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[33]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[34]  Xiaoxuan Hu,et al.  Recovery Model for Task Allocation Using Meta-level Information , 2012 .

[35]  Ryoichi Komiya,et al.  Comparison between fuzzy and NN method for speech emotion recognition , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[36]  Wu Zhaohui,et al.  Combining MFCC and Pitch to Enhance the Performance of the Gender Recognition , 2006, 2006 8th international Conference on Signal Processing.

[37]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[38]  F. Burkhardt,et al.  An Emotion-Aware Voice Portal , 2005 .

[39]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[40]  Nick Campbell Building a corpus of natural speech - and tools for the processing of expressive speech , 2001, INTERSPEECH.

[41]  Donald G. Childers,et al.  Modern Spectrum Analysis , 1978 .

[42]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Markus Iseli,et al.  The role of voice source measures on automatic gender classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.