Enhanced speech emotion detection using deep neural networks

This paper focusses on investigation of the effective performance of perceptual based speech features on emotion detection. Mel frequency cepstral coefficients (MFCC’s), perceptual linear predictive cepstrum (PLPC), Mel frequency perceptual linear prediction cepstrum (MFPLPC), bark frequency cepstral coefficients (BFCC), revised perceptual linear prediction coefficient’s (RPLP) and inverted Mel frequency cepstral coefficients (IMFCC) are the perception features considered. The algorithm using these auditory cues is evaluated with deep neural networks (DNN). The novelty of the work involves analysis of the perceptual features to identify predominant features that contain significant emotional information about the speaker. The validity of the algorithm is analysed on publicly available Berlin database with seven emotions in 1-dimensional space termed categorical and 2-dimensional continuous space consisting of emotions in valence and arousal dimensions. Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features.

[1]  Reza Lotfian,et al.  Ranking emotional attributes with deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Colleen Richey,et al.  Emotion detection in speech using deep networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Robert I. Damper,et al.  Classification of emotional speech using 3DEC hierarchical classifier , 2012, Speech Commun..

[4]  Y. X. Zou,et al.  An experimental study of speech emotion recognition based on deep convolutional neural networks , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[5]  Margaret Lech,et al.  Towards real-time Speech Emotion Recognition using deep neural networks , 2015, 2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS).

[6]  Kasiprasad Mannepalli,et al.  A novel Adaptive Fractional Deep Belief Networks for speaker emotion recognition , 2017 .

[7]  P. Ekman An argument for basic emotions , 1992 .

[8]  Fabien Ringeval,et al.  Prediction-based learning for continuous emotion recognition in speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[10]  Richard M. Stern,et al.  Delta-spectral cepstral coefficients for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Gholamreza Anbarjafari,et al.  Efficiency of chosen speech descriptors in relation to emotion recognition , 2017, EURASIP Journal on Audio, Speech, and Music Processing.

[12]  Uttam Kumar Roy,et al.  Emotion recognition using prosodie and spectral features of speech and Naïve Bayes Classifier , 2017, 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET).

[13]  Adam Pelikant,et al.  Comparison of perceptual features efficiency for automatic identification of emotional states from speech , 2013, 2013 6th International Conference on Human System Interactions (HSI).

[14]  Lijiang Chen,et al.  Multi-level Speech Emotion Recognition Based on HMM and ANN , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[15]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[16]  Fabio Paternò,et al.  Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema , 2012, International Journal of Speech Technology.

[17]  Samarendra Dandapat,et al.  Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions , 2019, IEEE Transactions on Affective Computing.

[18]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[19]  S. Lalitha,et al.  Speech emotion recognition , 2014, 2014 International Conference on Advances in Electronics Computers and Communications.

[20]  Thomas S. Huang,et al.  How deep neural networks can improve emotion recognition on video data , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[21]  Latha Robust Speaker Identification Incorporating High Frequency Features , 2016 .

[22]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals* , 2008, IEEE Transactions on Multimedia.

[23]  Björn W. Schuller,et al.  Multi-task deep neural network with shared hidden layers: Breaking down the wall between emotion representations , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Raja Noor Ainon,et al.  Speech emotion detection based on neural networks , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[25]  Raveendran Paramesran,et al.  Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features , 2017, IET Signal Process..

[26]  Zhong-Qiu Wang,et al.  Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Zhaocheng Huang,et al.  A PLLR and multi-stage Staircase Regression framework for speech-based emotion prediction , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  K. Sreenivasa Rao,et al.  Emotion recognition using LP residual at sub-segmental, segmental and supra-segmental levels , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[29]  Yanning Zhang,et al.  Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[30]  Sung Wook Baik,et al.  Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network , 2017, 2017 International Conference on Platform Technology and Service (PlatCon).

[31]  Jarek Krajewski,et al.  Analysis and Classification of Cold Speech Using Variational Mode Decomposition , 2020, IEEE Transactions on Affective Computing.

[32]  A. Revathy,et al.  Emotion recognition using continuous density HMM , 2015, 2015 International Conference on Communications and Signal Processing (ICCSP).

[33]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[34]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[35]  Li Deng,et al.  Three Classes of Deep Learning Architectures and Their Applications: A Tutorial Survey , 2012 .

[36]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[37]  Petr Horak,et al.  The influence of individual prosodic parameters on the perception of emotions in Czech , 2011, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2011.

[38]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Naomi Harte,et al.  Late integration of features for acoustic emotion recognition , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[40]  Rosângela Coelho,et al.  Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification , 2014, IEEE Signal Processing Letters.

[41]  Kai Yu,et al.  Acoustic emotion recognition using deep neural network , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[42]  Shikha Tripathi,et al.  Time-frequency and phase derived features for emotion classification , 2015, 2015 Annual IEEE India Conference (INDICON).

[43]  Yang Liu,et al.  A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space , 2017, IEEE Transactions on Affective Computing.

[44]  S. Lalitha,et al.  Emotion Detection Using MFCC and Cepstrum Features , 2015 .

[45]  Shamit Lal,et al.  Emotion recognition on speech signals using machine learning , 2017, 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC).

[46]  Che-Wei Huang,et al.  Flow of Renyi information in deep neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).