Efficient feature combination techniques for emotional speech classification

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.

[1]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[2]  Omar Farooq,et al.  Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables , 2013, Int. J. Speech Technol..

[3]  Eng Siong Chng,et al.  Comparison and Combination of Multilayer Perceptrons and Deep Belief Networks in Hybrid Automatic Speech Recognition Systems , 2011 .

[4]  Han Xu,et al.  Research on Different Feature Parameters in Speaker Recognition , 2013 .

[5]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[6]  Matthew Roughan,et al.  Real-time estimation of the parameters of long-range dependence , 2000, TNET.

[7]  Francesco Archetti,et al.  Audio-Based Emotion Recognition in Judicial Domain: A Multilayer Support Vector Machines Approach , 2009, MLDM.

[8]  Tomohiro Nakatani,et al.  A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition , 2006, Speech Commun..

[9]  Rosângela Coelho,et al.  Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification , 2014, IEEE Signal Processing Letters.

[10]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[11]  Ernests Petersons,et al.  Comparative Studies of Methods for Accurate Hurst Parameter Estimation , 2010 .

[12]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[13]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[14]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[15]  Hao Yu,et al.  Advantages of Radial Basis Function Networks for Dynamic System Design , 2011, IEEE Transactions on Industrial Electronics.

[16]  Tsang-Long Pao,et al.  Detecting Emotions in Mandarin Speech , 2004, ROCLING/IJCLCLP.

[17]  Sid-Ahmed Selouani,et al.  Speaker-independent ASR for Modern Standard Arabic: effect of regional accents , 2012, International Journal of Speech Technology.

[18]  Mahesh Chandra,et al.  Design of Neural Network Model for Emotional Speech Recognition , 2015 .

[19]  Patrice Abry,et al.  A Wavelet-Based Joint Estimator of the Parameters of Long-Range Dependence , 1999, IEEE Trans. Inf. Theory.

[20]  K. Poulose Jacob,et al.  COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH RECOGNITION , 2013 .

[21]  Mihir Narayan Mohanty,et al.  Emotion recognition using MLP and GMM for Oriya language , 2017, Int. J. Comput. Vis. Robotics.

[22]  Valeri Mladenov,et al.  Neural networks used for speech recognition , 2010 .

[23]  Yang Li,et al.  Recognizing emotions in speech using short-term and long-term features , 1998, ICSLP.

[24]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[25]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[26]  Divakar Yadav,et al.  UNDERSTANDING AND ESTIMATION OF EMOTIONAL EXPRESSION USING ACOUSTIC ANALYSIS OF NATURAL SPEECH , 2013 .

[27]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[28]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[29]  Guo Chunyu,et al.  A Hybrid Speech Emotion Perception Method of VQ-based Feature Processing and ANN Recognition , 2009, 2009 WRI Global Congress on Intelligent Systems.

[30]  Matthew Roughan,et al.  Measuring long-range dependence under changing traffic conditions , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[31]  B. A. Tanawala,et al.  Comparative Study of MFCC AndLPC Algorithms for Gujrati Isolated WordRecognition , 2015 .

[32]  Patrice Abry,et al.  Wavelet Analysis of Long-Range-Dependent Traffic , 1998, IEEE Trans. Inf. Theory.

[33]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[34]  Sunil Kumar Kopparapu,et al.  Choice of Mel filter bank in computing MFCC of a resampled speech , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[35]  Rosângela Coelho,et al.  Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[37]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[38]  R. Santos,et al.  Comparison Between Multilayer Feedforward Neural Networks and Radial Basis Function Network to Detect and Locate Leaks in a Pipeline Transporting Gas , 2013 .

[39]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[40]  Astik Biswas,et al.  Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition , 2014, Int. J. Speech Technol..