Child emotion recognition using probabilistic neural network with effective features

Abstract Use of effective features for emotion recognition is a step towards better accuracy. The segmental features extracted from the utterances have a high dimension as well as redundant. Low-dimensional feature set for neural network based model is essential and equally important. However the reduction technique is to be chosen properly. This paper proposes a feature reduction mechanism using the combination of Vector Quantization (VQ) and eigenvalue decomposition for effective feature utility. The lower order Eigen components are more informative as compared to the Principal Components and are considered in this work to analyze children speech emotions. For classification, the Probabilistic Neural Network (PNN) model is considered due to its accuracy level with the statistical features. It has been observed that the accuracy level is higher with the proposed VQ based Eigen features. Enhanced accuracy up to 97.5% for VQ based Mel-frequency Cepstral coefficients reduced by their eigenvalues has been exhibited.

[1]  H. Palo,et al.  Wavelet based feature combination for recognition of emotions , 2017, Ain Shams Engineering Journal.

[2]  Shweta Ghai,et al.  Pitch adaptive MFCC features for improving children’s mismatched ASR , 2015, International Journal of Speech Technology.

[3]  Shrikanth S. Narayanan,et al.  Detecting emotional state of a child in a conversational computer game , 2011, Comput. Speech Lang..

[4]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[5]  Mihir Narayan Mohanty,et al.  Emotion recognition using MLP and GMM for Oriya language , 2017, Int. J. Comput. Vis. Robotics.

[6]  Ragini Verma,et al.  Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech , 2015, Comput. Speech Lang..

[7]  C. Yu,et al.  Feature Optimization of Speech Emotion Recognition , 2016 .

[8]  Bin Zhang,et al.  Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition , 2013, Proceedings of the 2013 IEEE/SICE International Symposium on System Integration.

[9]  Carlos Busso,et al.  Using Agreement on Direction of Change to Build Rank-Based Emotion Classifiers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11]  Utpal Bhattacharjee,et al.  Features and Model Adaptation Techniques for Robust Speech Recognition: A Review , 2015 .

[12]  Hans G. C. Tråvén,et al.  A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[13]  Donald F. Specht,et al.  Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification , 1990, IEEE Trans. Neural Networks.

[14]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[15]  Lijiang Chen,et al.  Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[16]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[17]  Jian Jia,et al.  Dimension Reduction of Speech Emotion Feature Based on Weighted Linear Discriminant Analysis , 2015 .

[18]  Mihir Narayan Mohanty,et al.  Speech Emotion Analysis of Different Age Groups Using Clustering Techniques , 2018, Int. J. Inf. Retr. Res..

[19]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[20]  Bayya Yegnanarayana,et al.  Supervised texture classification using a probabilistic neural network and constraint satisfaction model , 1998, IEEE Trans. Neural Networks.

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  Haizhou Li,et al.  GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[24]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[25]  Bin Zhang,et al.  A combined cepstral distance method for emotional speech recognition , 2017 .

[26]  Wee Ser,et al.  Probabilistic neural-network structure determination for pattern classification , 2000, IEEE Trans. Neural Networks Learn. Syst..

[27]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[28]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[29]  Pietro Burrascano,et al.  Learning vector quantization for the probabilistic neural network , 1991, IEEE Trans. Neural Networks.

[30]  Guo Chunyu,et al.  A Hybrid Speech Emotion Perception Method of VQ-based Feature Processing and ANN Recognition , 2009, 2009 WRI Global Congress on Intelligent Systems.

[31]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Mihir Narayan Mohanty,et al.  Classification of Emotional Speech of Children Using Probabilistic Neural Network , 2015 .