Text-independent speech emotion recognition using frequency adaptive features

In this paper a text-independent emotional speech feature extraction method is studied based on various spectral frequency bands of speech formants. First, the speech emotional feature analysis is performed for different text contents. Various sentences are involved to study the phonetic influences. Formant frequencies are grouped into different classes in order to reduce the text variability. These speech features are sensitive to phonetic changes in the sentence. Speaker emotions are then modeled in different formant groups. Second, adaptive fundamental frequency and Teager Energy Operator are constructed in different frequency bands. The Teager frequency bands are dynamically adapted to different pitch and formant distributions. The proposed adaptation is sensitive to emotional changes in speech, as shown in statistics of mean, variance, maximum and minimum. Statistics on the basic acoustic parameters are used as the emotional features. Experimental results show that the proposed emotional features are robust against text changes with the lowest variance value of 0.034. The final recognition results for six major emotion types are improved constantly and 5 percent improvement for sadness and 3.3 percent improvement for boredom are observed.

[1]  Gao Hui,et al.  Emotion classification of mandarin speech based on TEO nonlinear features , 2007, SNPD.

[2]  Andreas Spanias,et al.  Within and cross-corpus speech emotion recognition using latent topic model-based features , 2015, EURASIP J. Audio Speech Music. Process..

[3]  Nevin Augustine,et al.  Speech emotion recognition system using both spectral and prosodic features , 2015 .

[4]  Hui Zhang,et al.  A pairwise algorithm for pitch estimation and speech separation using deep stacking network , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[6]  Luming Zhang,et al.  Fortune Teller: Predicting Your Career Path , 2016, AAAI.

[7]  Li Liu,et al.  Recognizing Complex Activities by a Probabilistic Interval-Based Model , 2016, AAAI.

[8]  Nilesh R. Patel,et al.  Implementation and Comparison of Speech Emotion Recognition System Using Gaussian Mixture Model (GMM) and K- Nearest Neighbor (K-NN) Techniques , 2015 .

[9]  A. Pankratova,et al.  The Relationship of Emotion Intelligence with Intelligence and Personality , 2014 .

[10]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[11]  Hui Zhang,et al.  A Pairwise Algorithm Using the Deep Stacking Network for Speech Separation and Pitch Estimation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  F. Salzenstein,et al.  IF estimation using empirical mode decomposition and nonlinear Teager energy operator , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[13]  Chengwei Huang,et al.  Research on several key technologies in practical speech emotion recognition , 2017, ArXiv.

[14]  Aoying Zhou,et al.  Transductive Non-linear Learning for Chinese Hypernym Prediction , 2017, ACL.

[15]  Hongbin Zha,et al.  Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Min Chen,et al.  AIWAC: affective interaction through wearable computing and cloud technology , 2015, IEEE Wireless Communications.

[17]  Xiang Li,et al.  Speech Emotion Recognition Using Novel HHT-TEO Based Features , 2011, J. Comput..

[18]  Syed Abbas Ali,et al.  Analyzing the Impact of Prosodic Feature (Pitch) on Learning Classifiers for Speech Emotion Corpus , 2015 .

[19]  Jing Li,et al.  A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding , 2015, CCBR.

[20]  Yu Zheng,et al.  Urban Water Quality Prediction Based on Multi-Task Multi-View Learning , 2016, IJCAI.

[21]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.