Extraction of Novel Features Based on Histograms of MFCCs Used in Emotion Classification from Generated Original Speech Dataset

This paper introduces two significant contributions: one is a new feature based on histograms of MFCC (Mel-Frequency Cepstral Coefficients) extracted from the audio files that can be used in emotion classification from speech signals, and the other – our new multi-lingual and multi-personal speech database, which has three emotions. In this study, Berlin Database (BD) (in German) and our custom PAU database (in English) created from YouTube videos and popular TV shows are employed to train and evaluate the test results. Experimental results show that our proposed features lead to better classification of results than the current state-of-the-art approaches with Support Vector Machine (SVM) from the literature. Thanks to our novel feature, this study can outperform a number of MFCC features and SVM classifier based studies, including recent researches. Due to the lack of our novel feature based approaches, one of the most common MFCC and SVM framework is implemented and one of the most common database Berlin DB  is used to compare our novel approach with these kind of approaches.

[1]  Kamil Aida-zade,et al.  Speech recognition using Support Vector Machines , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[2]  P. Malathi,et al.  Speaker dependent speech emotion recognition using MFCC and Support Vector Machine , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[3]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[4]  Bin Yang,et al.  Combining classifiers with diverse feature sets for robust speaker independent emotion recognition , 2009, 2009 17th European Signal Processing Conference.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Kim-Kwang Raymond Choo,et al.  SVM or deep learning? A comparative study on remote sensing image classification , 2016, Soft Computing.

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Semiye Demircan,et al.  Feature Extraction from Speech Data for Emotion Recognition , 2014 .

[9]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[10]  Kishor B. Bhangale,et al.  Sound based human emotion recognition using MFCC & multiple SVM , 2017, 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC).

[11]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .

[12]  Guoyin Wang,et al.  Speech Emotion Recognition Based on Rough Set and SVM , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[13]  B. Schuller,et al.  Robust Acoustic Speech Emotion Recognition by Ensembles of Classifiers , 2005 .

[14]  Shambhavi. S. Sheerur,et al.  Emotion Speech Recognition using MFCC and SVM , 2015 .

[15]  Georg Heigold,et al.  Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Fan Zhang,et al.  Deep Convolutional Neural Networks for Hyperspectral Image Classification , 2015, J. Sensors.

[17]  Md. Saiful Islam,et al.  Speaker Identification using MFCC-Domain Support Vector Machine , 2010, ArXiv.

[18]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[19]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[21]  Amirreza Shirani,et al.  Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier , 2016 .

[22]  M. S. Sinith,et al.  Emotion recognition from audio signals using Support Vector Machine , 2015, 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS).

[23]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[25]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.