Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition

In this paper, the process of selecting a classifier based on the properties of dataset is designed since it is very difficult to experiment the data on n—number of classifiers. As a case study speech emotion recognition is considered. Different combinations of spectral and prosodic features relevant to emotions are explored. The best subset of the chosen set of features is recommended for each of the classifiers based on the properties of chosen dataset. Various statistical tests have been used to estimate the properties of dataset. The nature of dataset gives an idea to select the relevant classifier. To make it more precise, three other clustering and classification techniques such as K-means clustering, vector quantization and artificial neural networks are used for experimentation and results are compared with the selected classifier. Prosodic features like pitch, intensity, jitter, shimmer, spectral features such as mel frequency cepstral coefficients (MFCCs) and formants are considered in this work. Statistical parameters of prosody such as minimum, maximum, mean ($$\mu$$μ) and standard deviation ($$\sigma$$σ) are extracted from speech and combined with basic spectral (MFCCs) features to get better performance. Five basic emotions namely anger, fear, happiness, neutral and sadness are considered. For analysing the performance of different datasets on different classifiers, content and speaker independent emotional data is used, collected from Telugu movies. Mean opinion score of fifty users is collected to label the emotional data. To make it more accurate, one of the benchmark IIT-Kharagpur emotional database is used to generalize the conclusions.

[1]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[2]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[3]  William F. Christensen,et al.  Methods of Multivariate Analysis: Rencher/Methods , 2012 .

[4]  Takeo Kanade,et al.  Recognizing lower face action units for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[6]  Shashidhar G. Koolagudi,et al.  Emotion Recognition using Speech Features , 2012, Springer Briefs in Electrical and Computer Engineering.

[7]  Mireia Farrús,et al.  Using jitter and shimmer in speaker verification , 2009 .

[8]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[9]  Shashidhar G. Koolagudi,et al.  IITKGP-SESC: Speech Database for Emotion Analysis , 2009, IC3.

[10]  Björn W. Schuller,et al.  Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.

[11]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .

[12]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[13]  Sunil Kumar Kopparapu,et al.  Knowledge-based Framework for Intelligent Emotion Recognition in Spontaneous Speech , 2016, KES.

[14]  R. L. Bouquin Enhancement of noisy speech signals: application to mobile radio communications , 1996 .

[15]  K. Takahashi,et al.  Remarks on SVM-based emotion recognition from multi-modal bio-potential signals , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[16]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[17]  Shashidhar G. Koolagudi,et al.  Speech Emotion Recognition: A Review , 2013 .

[18]  S. Nooteboom,et al.  THE PROSODY OF SPEECH: MELODY AND RHYTHM , 2001 .

[19]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[20]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[21]  Ling Guan,et al.  A neural network approach for human emotion recognition in speech , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[22]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[23]  H. Akaike A new look at the statistical model identification , 1974 .

[24]  Ryoichi Komiya,et al.  Comparison between fuzzy and NN method for speech emotion recognition , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[25]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[26]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[27]  L.C. De Silva,et al.  Speech based emotion classification , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[28]  José B. Mariño,et al.  Speech recognition in a noisy car environment based on LP of the one-sided autocorrelation sequence and robust similarity measuring techniques , 1997, Speech Commun..

[29]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[30]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[31]  Thomas S. Huang,et al.  Emotion recognition from speech VIA boosted Gaussian mixture models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[32]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[33]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[34]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[35]  Moataz M. H. El Ayadi,et al.  On the Determination of Optimal Model Order for GMM-Based Text-Independent Speaker Identification , 2004, EURASIP J. Adv. Signal Process..

[36]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[37]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[38]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[39]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[40]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[41]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[42]  Larry S. Davis,et al.  Computing spatio-temporal representations of human faces , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Amarjot Singh,et al.  The decisive emotion identifier? , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[44]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[45]  K. B. Khanchandani,et al.  Emotion recognition using multilayer perceptron and generalized feed forward neural network , 2009 .

[46]  Gerasimos Potamianos,et al.  A hierarchical approach with feature selection for emotion recognition from speech , 2012, LREC.

[47]  Peng Song,et al.  Speech Emotion Recognition Using Transfer Learning , 2014, IEICE Trans. Inf. Syst..

[48]  Barbara Heuft,et al.  Emotions in time domain synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[49]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[50]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[51]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[52]  Shashidhar G. Koolagudi,et al.  Spectral Features for Emotion Classification , 2009, 2009 IEEE International Advance Computing Conference.

[53]  Peng Song,et al.  Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization , 2016, Speech Commun..

[54]  Nick Campbell,et al.  A corpus-based speech synthesis system with emotion , 2003, Speech Commun..

[55]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[56]  Sartra Wongthanavasu,et al.  Speech emotion recognition using Support Vector Machines , 2013, 2013 5th International Conference on Knowledge and Smart Technology (KST).

[57]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[58]  Nick Campbell,et al.  A Speech Synthesis System with Emotion for Assisting Communication , 2000 .

[59]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using global and local prosodic features , 2013, Int. J. Speech Technol..

[60]  Yonghong Yan,et al.  Speech Emotion Recognition Using Both Spectral and Prosodic Features , 2009, 2009 International Conference on Information Engineering and Computer Science.

[61]  Shiqing Zhang,et al.  Speech Emotion Recognition Using Support Vector Machines , 2011, CSIE 2011.

[62]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[63]  H. Hashimoto,et al.  Pattern recognition of emotion with neural network , 1995, Proceedings of IECON '95 - 21st Annual Conference on IEEE Industrial Electronics.

[64]  Yang Li,et al.  Recognizing emotions in speech using short-term and long-term features , 1998, ICSLP.

[65]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[66]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[67]  P. Babu Anto,et al.  Automatic Emotion Recognition from Speech Using Artificial Neural Networks with Gender-Dependent Databases , 2009 .

[68]  N P Jawarkar Emotion Recognition using Prosody Features and a Fuzzy Min-Max Neural Classifier , 2007 .

[69]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[70]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[71]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[72]  Michael Beetz,et al.  Learning and Reasoning with Action-Related Places for Robust Mobile Manipulation , 2014, J. Artif. Intell. Res..

[73]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[74]  Lipo Wang Support vector machines : theory and applications , 2005 .

[75]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[76]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[77]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[78]  Jian Jhen Chen,et al.  K-means clustering versus validation measures: a data-distribution perspective. , 2009, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[79]  K S Rao,et al.  Emotion recognition from speech signal using epoch parameters , 2010, 2010 International Conference on Signal Processing and Communications (SPCOM).

[80]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[81]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[82]  Elisabeth André,et al.  EmoVoice - A Framework for Online Recognition of Emotions from Voice , 2008, PIT.

[83]  Sazali Yaacob,et al.  Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals , 2015 .

[84]  Lijiang Chen,et al.  Relative Speech Emotion Recognition Based Artificial Neural Network , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[85]  Yasunari Obuchi,et al.  Emotion Recognition using Mel-Frequency Cepstral Coefficients , 2007 .

[86]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[87]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[88]  Guoyin Wang,et al.  Speech Emotion Recognition Based on Rough Set and SVM , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[89]  Pavol Partila,et al.  Speech Emotions Recognition Using 2-D Neural Classifier , 2013, NOSTRADAMUS.

[90]  Björn W. Schuller,et al.  Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition , 2017, IEEE Signal Processing Letters.

[91]  Jie Huang,et al.  Variance-Based Gaussian Kernel Fuzzy Vector Quantization for Emotion Recognition with Short Speech , 2012, 2012 IEEE 12th International Conference on Computer and Information Technology.

[92]  Aaron E. Rosenberg,et al.  Report: A vector quantization approach to speaker recognition , 1987, AT&T Technical Journal.

[93]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[94]  L. Kaiser Communication of affects by single vowels , 1962, Synthese.

[95]  B. Yegnanarayana,et al.  Artificial neural networks for pattern recognition , 1994 .

[96]  Wolfgang Hess,et al.  Pitch and Voicing Determination of Speech with an Extension Toward Music Signals , 2008 .

[97]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[98]  M. L. Dhore,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .

[99]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[100]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[101]  Saeed Setayeshi,et al.  Speech emotion recognition based on a modified brain emotional learning model , 2017, BICA 2017.

[102]  S. Lalitha,et al.  Speech emotion recognition , 2014, 2014 International Conference on Advances in Electronics Computers and Communications.

[103]  Amit Konar,et al.  Emotion Recognition: A Pattern Analysis Approach , 2015 .

[104]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[105]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[106]  Sunil Kumar,et al.  Security on Mobile Agent Based Crawler (SMABC) , 2010 .

[107]  Keshi Dai,et al.  Recognizing emotion in speech using neural networks , 2008 .

[108]  Chun Chen,et al.  An Enhanced Speech Emotion Recognition System Based on Discourse Information , 2006, International Conference on Computational Science.

[109]  Björn W. Schuller,et al.  Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[110]  Ion Muslea,et al.  Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[111]  John H. L. Hansen,et al.  N-channel hidden Markov models for combined stressed speech classification and recognition , 1999, IEEE Trans. Speech Audio Process..

[112]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[113]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[114]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[115]  Xiong Chen,et al.  Automatic Speech Emotion Recognition using Support Vector Machine , 2011, Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology.

[116]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[117]  Guo Chunyu,et al.  A Hybrid Speech Emotion Perception Method of VQ-based Feature Processing and ANN Recognition , 2009, 2009 WRI Global Congress on Intelligent Systems.

[118]  Shashidhar G. Koolagudi,et al.  Emotion recognition using LP residual , 2010, 2010 IEEE Students Technology Symposium (TechSym).

[119]  Anders Eriksson,et al.  The frequency range of the voice fundamental in the speech of male and female adults , 1993 .

[120]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[121]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[122]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[123]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[124]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[125]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[126]  Xi Li,et al.  Stress and Emotion Classification using Jitter and Shimmer Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.