Speech Emotional Features Extraction Based on Electroglottograph

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.

[1]  S. R. Mahadeva Prasanna,et al.  Analysis of excitation source information in emotional speech , 2010, INTERSPEECH.

[2]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[3]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[4]  Wei Wu,et al.  GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Shinji Maeda,et al.  Fundamental frequency histograms measured by electroglottography during speech: a pilot study for standardization. , 2006, Journal of voice : official journal of the Voice Foundation.

[6]  Brigitte Krenn,et al.  Fully generated scripted dialogue for embodied agents , 2008, Artif. Intell..

[7]  Elisabeth André,et al.  Emotion recognition based on physiological changes in music listening , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Lijiang Chen,et al.  Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  M. Borchert,et al.  Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[11]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[12]  Johan Sundberg,et al.  Simultaneous analysis of vocal fold vibration and transglottal airflow: exploring a new experimental setup. , 2003, Journal of voice : official journal of the Voice Foundation.

[13]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[14]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[16]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[18]  Carlos A. Reyes-García,et al.  Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model , 2012 .

[19]  Takashi X. Fujisawa,et al.  Evaluation of the affective valence of speech using pitch substructure , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Mansour Sheikhan,et al.  Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model , 2012, Neural Computing and Applications.

[21]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[22]  Hiroya Fujisaki,et al.  Information, prosody, and modeling - with emphasis on tonal features of speech - , 2004, Speech Prosody 2004.

[23]  Peter Robinson,et al.  Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in Speech , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Mitsuru Ishizuka,et al.  Speech Emotional Features Measured by Power-law Distribution based on Electroglottography , 2012, BIOSIGNALS.

[25]  Zhiping Wang,et al.  Speech Emotional Recognition Using Global and Time Sequence Structure Features with MMD , 2005, ACII.

[26]  Kyu-Sik Park,et al.  A Study of Emotion Recognition and Its Applications , 2007, MDAI.

[27]  Lijiang Chen,et al.  Multi-level Speech Emotion Recognition Based on HMM and ANN , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[28]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[29]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[30]  A. Tanju Erdem,et al.  Formant position based weighted spectral features for emotion recognition , 2011, Speech Commun..

[31]  Emiliano Lorini,et al.  A New Look at the Semantics and Optimization Methods of CP-Networks , 2003, IJCAI.

[32]  Yi Luo,et al.  Speech emotion recognition based on a hybrid of HMM/ANN , 2007 .

[33]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[34]  Chang Dong Yoo,et al.  Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Andreas Klaus,et al.  Statistical Analyses Support Power Law Distributions Found in Neuronal Avalanches , 2011, PloS one.

[36]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[37]  Lijiang Chen,et al.  Speech Emotion Recognition Based on Parametric Filter and Fractal Dimension , 2010, IEICE Trans. Inf. Syst..

[38]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[39]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Kyu-Sik Park,et al.  Speech Emotion Pattern Recognition Agent in Mobile Communication Environment Using Fuzzy-SVM , 2007, ICFIE.

[41]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[42]  Tieniu Tan,et al.  Affective Computing: A Review , 2005, ACII.

[43]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[44]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..