Segment based emotion recognition using combined reduced features

The attitude of a human being involves with their emotions. Emotions can be observed in either verbally or visually or both. Verbal emotion recognition is a difficult task and an area of speech processing. It has a wide variety of applications in almost all fields. In this work, the authors have tried to recognize five types of emotion as anger, sadness, happiness, fear, and neutral. The work is focussed on the choice of spectral feature computation. For such purpose, Mel-frequency Cepstral coefficients (MFCC), spectral roll-off, spectral centroid and spectral flux are considered on frame-level extraction. Some of these features need to be reduced, combined, and balanced. The combined methods are verified and observed the effectiveness of results. The resulting features are used with neural network (NN) based models for recognition purpose. The models of multilayer perceptron (MLP), radial basis function network (RBFN), probabilistic neural network (PNN) and deep neural network (DNN) are considered and tested for the chosen features. It is observed that less amount of features provides reliable accuracy in case of PNN. The same utilizes less time for training and testing in case of MLP, RBFN, and PNN. However, DNN is not suitable for fewer amounts of features. It requires large data for better accuracy in the particular field. The results support the PNN with an average accuracy of 96.9% with low-dimensional feature sets, whereas the average accuracy of MLP, RBFN, DNN models found 90.1%, 92.7%, and 73.6% respectively.

[1]  Kandarpa Kumar Sarma,et al.  I-vector Based Emotion Recognition in Assamese Speech , 2016 .

[2]  Jian Jia,et al.  Dimension Reduction of Speech Emotion Feature Based on Weighted Linear Discriminant Analysis , 2015 .

[3]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[4]  D. F. Specht,et al.  Experience with adaptive probabilistic neural networks and adaptive general regression neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[5]  M. Sasi Kumar,et al.  Application of Vector Quantization in Emotion Recognition from Human Speech , 2011, ICISTM.

[6]  H. Palo,et al.  Wavelet based feature combination for recognition of emotions , 2017, Ain Shams Engineering Journal.

[7]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[8]  Javier Ruiz-del-Solar,et al.  Analysis and Comparison of Eigenspace-Based Face Recognition Approaches , 2002, Int. J. Pattern Recognit. Artif. Intell..

[9]  Carmen García-Mateo,et al.  A study of acoustic features for depression detection , 2014, 2nd International Workshop on Biometrics and Forensics.

[10]  Mihir Narayan Mohanty,et al.  Efficient feature combination techniques for emotional speech classification , 2016, International Journal of Speech Technology.

[11]  Vidhyasaharan Sethu,et al.  An i-vector GPLDA system for speech based emotion recognition , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[12]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[13]  Fakhri Karray,et al.  Dimensionality Reduction for Emotional Speech Recognition , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[14]  Melissa N. Stolar,et al.  Detection of Adolescent Depression from Speech Using Optimised Spectral Roll-Off Parameters , 2018, Biomedical Journal of Scientific & Technical Research.

[15]  Bin Zhang,et al.  Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition , 2013, Proceedings of the 2013 IEEE/SICE International Symposium on System Integration.

[16]  Carlos Busso,et al.  Using Agreement on Direction of Change to Build Rank-Based Emotion Classifiers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[18]  Hong Liu,et al.  A compact representation of human actions by sliding coordinate coding , 2017 .

[19]  Shashidhar G. Koolagudi,et al.  Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition , 2018, International Journal of Speech Technology.

[20]  Jiri Pribil,et al.  Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech , 2013, EURASIP J. Audio Speech Music. Process..

[21]  Jianbo Jiang,et al.  Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[22]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[23]  Alexander Lerch,et al.  Hierarchical Automatic Audio Signal Classification , 2004 .

[24]  Aurobinda Routray,et al.  Machine Learning Approach for Emotional Speech Classification , 2014, SEMCCO.

[25]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[26]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[27]  P. Jackson,et al.  Multimodal Emotion Recognition , 2010 .

[28]  Ning An,et al.  Speech emotion recognition based on wavelet packet coefficient model , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[29]  Mita Nasipuri,et al.  Reduction of Feature Vectors Using Rough Set Theory for Human Face Recognition , 2010, ArXiv.

[30]  W. Velicer,et al.  Comparison of five rules for determining the number of components to retain. , 1986 .

[31]  Ning An,et al.  Harmony search for feature selection in speech emotion recognition , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[32]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[33]  Bin Zhang,et al.  A combined cepstral distance method for emotional speech recognition , 2017 .

[34]  Wee Ser,et al.  Probabilistic neural-network structure determination for pattern classification , 2000, IEEE Trans. Neural Networks Learn. Syst..

[35]  Guo Chunyu,et al.  A Hybrid Speech Emotion Perception Method of VQ-based Feature Processing and ANN Recognition , 2009, 2009 WRI Global Congress on Intelligent Systems.

[36]  Björn W. Schuller,et al.  Dimensionality reduction for speech emotion features by multiscale kernels , 2015, INTERSPEECH.

[37]  Nicholas B. Allen,et al.  Multichannel Weighted Speech Classification System for Prediction of Major Depression in Adolescents , 2013, IEEE Transactions on Biomedical Engineering.

[38]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[39]  Mislav Grgic,et al.  Independent comparative study of PCA, ICA, and LDA on the FERET data set , 2005, Int. J. Imaging Syst. Technol..

[40]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Joan Gomes Implementation of i-vector algorithm in speech emotion recognition by using two different classifiers: Gaussian mixture model and support vector machine , 2016 .

[42]  Chia-Ping Chen,et al.  Feature space dimension reduction in speech emotion recognition using support vector machine , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[43]  Abdullah I. Al-Shoshan,et al.  Speech and Music Classification and Separation: A Review , 2006 .