论文信息 - Meta-heuristic approach in neural network for stress detection in Marathi speech

Meta-heuristic approach in neural network for stress detection in Marathi speech

Stress is defined as a form of psychalgia. Owing to the current day lifestyle of Homo-sapiens, the most recurring pain is psychogenic; and the most damaging form of psychalgia. Stress in its most severe form, has led to the death of many individuals of this species. In accordance to a study conducted by WHO in 2015, around 800,000 individuals commit suicide each year (one individual per 40 s). The only solution to this conundrum is to bring in efficient mechanized stress detection technique which utilize proven measures and are unbiased, is called “speech emotion recognition” (SER). Stress, by itself, is not an emotion, but gives rise to specific emotions. This paper proposes SER using neural network classifier with weight optimization using fusion of optimization algorithms viz. BAT, genetic algorithm, particle swarm organization and simulated annealing. Classifier is trained using multi-model feature set. Gammatone Wavelet Cepstral coefficient, Mel Frequency Cepstral coefficient, pitch, vocal tract frequency and energy are the features used to identify different emotions. Detect the stress level being main objective SUSAS benchmark database and Marathi language database is used for performance analysis. Performance parameters like cost function for evaluating meta-heuristic optimization algorithm and accuracy of emotion detection is calculated. The overall accuracy of 84.2% of stress related emotions is achieved.

L. K. Ragha | Vaijanath V. Yerigeri | V. V. Yerigeri | L. Ragha

[1] Sung Wook Baik,et al. Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network , 2017, 2017 International Conference on Platform Technology and Service (PlatCon).

[2] Amir Hossein Gandomi,et al. Firefly Algorithm for solving non-convex economic dispatch problems with valve loading effect , 2012, Appl. Soft Comput..

[3] Ismail Shahin,et al. Emotion Recognition Using Hybrid Gaussian Mixture Model and Deep Neural Network , 2019, IEEE Access.

[4] F. Albu,et al. NEURAL NETWORK APPROACHES FOR CHILDREN'S EMOTION RECOGNITION IN INTELLIGENT LEARNING APPLICATIONS , 2015 .

[5] Jorg Kliewer,et al. The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis , 1998 .

[6] Jesús B. Alonso,et al. New approach in quantification of emotional intensity from the speech signal: emotional temperature , 2015, Expert Syst. Appl..

[7] Christian Blum,et al. Training feed-forward neural networks with ant colony optimization: an application to pattern classification , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[8] Kyoung-jae Kim. Artificial neural networks with evolutionary instance selection for financial forecasting , 2006, Expert Syst. Appl..

[9] Ning An,et al. Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[10] Roddy Cowie,et al. Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[11] Christian Blum,et al. An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training , 2007, Neural Computing and Applications.

[12] Yair Moshe,et al. Detection of distress in speech , 2016, 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE).

[13] Fabio Stella,et al. Continuous time Bayesian network classifiers , 2012, J. Biomed. Informatics.

[14] Girish Kant,et al. Optimization of Machining Parameters to Minimize Surface Roughness using Integrated ANN-GA Approach , 2015 .

[15] S. R. Mahadeva Prasanna,et al. A Subspace Projection Approach for Analysis of Speech Under Stressed Condition , 2016, Circuits Syst. Signal Process..

[16] Chandra Sekhar Seelamantula,et al. Auditory-motivated Gammatone wavelet transform , 2014, Signal Process..

[17] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[18] Debahuti Mishra,et al. A New Meta-heuristic Bat Inspired Classification Approach for Microarray Data , 2012 .

[19] Yanning Zhang,et al. Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[20] Selim Yilmaz,et al. A new modification approach on bat algorithm for solving optimization problems , 2015, Appl. Soft Comput..

[21] Nilanjan Dey,et al. Forest Type Classification: A Hybrid NN-GA Model Based Approach , 2016 .

[22] K. Stevens,et al. Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[23] Heder S. Bernardino,et al. A hybrid genetic algorithm for constrained optimization problems in mechanical engineering , 2007, 2007 IEEE Congress on Evolutionary Computation.

[24] Taher Niknam,et al. Stochastic Reconfiguration and Optimal Coordination of V2G Plug-in Electric Vehicles Considering Correlated Wind Power Generation , 2015, IEEE Transactions on Sustainable Energy.

[25] Fan Wang,et al. The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] Reza Malekian,et al. Speech emotion features selection based on BBO-SVM , 2018, 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI).

[27] Eyal Yair,et al. Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[28] Aaron E. Rosenberg,et al. A comparative performance study of several pitch detection algorithms , 1976 .

[29] Erwie Zahara,et al. Hybrid Nelder-Mead simplex search and particle swarm optimization for constrained engineering design problems , 2009, Expert Syst. Appl..

[30] Lixiang Li,et al. CHAOTIC PARTICLE SWARM OPTIMIZATION FOR ECONOMIC DISPATCH CONSIDERING THE GENERATOR CONSTRAINTS , 2007 .

[31] D. J. Hermes,et al. Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[32] K. Scherer,et al. Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[33] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[34] Changyong Liang,et al. An effective multiagent evolutionary algorithm integrating a novel roulette inversion operator for engineering optimization , 2009, Appl. Math. Comput..

[35] Wenjian Luo,et al. Differential evolution with dynamic stochastic selection for constrained optimization , 2008, Inf. Sci..

[36] Carlos A. Coello Coello,et al. A modified version of a T‐Cell Algorithm for constrained optimization problems , 2010 .

[37] Wei Yang,et al. Reject inference in credit scoring using Semi-supervised Support Vector Machines , 2017, Expert Syst. Appl..

[38] E. Kramer. Judgment of personal characteristics and emotions from nonverbal properties of speech. , 1963, Psychological bulletin.

[39] Kasiprasad Mannepalli,et al. A novel Adaptive Fractional Deep Belief Networks for speaker emotion recognition , 2017 .

[40] Nilanjan Dey,et al. Dengue Fever Classification Using Gene Expression Data: A PSO Based Artificial Neural Network Approach , 2016, FICTA.

[41] K. YogeshC.,et al. Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech , 2017, Appl. Soft Comput..

[42] Wolfgang Minker,et al. Speech-Based Emotion Recognition: Feature Selection by Self-Adaptive Multi-Criteria Genetic Algorithm , 2014, LREC.

[43] Miguel Angel Ferrer-Ballester,et al. Nonlinear dynamics characterization of emotional speech , 2014, Neurocomputing.

[44] A. Gandomi,et al. Mixed variable structural optimization using Firefly Algorithm , 2011 .

[45] Qiang Li,et al. Double sparse learning model for speech emotion recognition , 2016 .

[46] Sazali Yaacob,et al. Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals , 2015 .

[47] M. Ross,et al. Average magnitude difference function pitch extractor , 1974 .

[48] Heder S. Bernardino,et al. A new hybrid AIS-GA for constrained optimization problems in mechanical engineering , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[49] Ismail Shahin,et al. Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s , 2015, Int. J. Speech Technol..

[50] Malcolm Slaney,et al. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[51] Christian Breiteneder,et al. Features for Content-Based Audio Retrieval , 2010, Adv. Comput..

[52] Francisco Herrera,et al. kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[53] Nilanjan Dey,et al. Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings , 2016, Neural Computing and Applications.

[54] Z. Cihan Taysi,et al. Audio-based gender and age identification , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[55] Saurabh Sood,et al. A robust on-the-fly pitch (OTFP) estimation algorithm , 2004, MULTIMEDIA '04.

[56] Sanjeev Khudanpur,et al. A pitch extraction algorithm tuned for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57] Fakhri Karray,et al. Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[58] E. Hoff-Ginsberg. Maternal speech and the child's development of syntax: a further look , 1990, Journal of Child Language.

[59] Yongzhao Zhan,et al. Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[60] Inma Hernáez,et al. Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[61] Philippe Martin. Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[62] Zhaocheng Huang,et al. An Investigation of Partition-Based and Phonetically-Aware Acoustic Features for Continuous Emotion Prediction from Speech , 2020, IEEE Transactions on Affective Computing.

[63] Wolfgang Hess,et al. Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[64] Fuhao Zhang,et al. A Combination of Genetic Algorithm and Particle Swarm Optimization for Vehicle Routing Problem with Time Windows , 2015, Sensors.

[65] G. Janvale,et al. Emotion Recognition System from Artificial Marathi Speech using MFCC and LDA Techniques , 2014 .

[66] M. Pandit,et al. Self-Organizing Hierarchical Particle Swarm Optimization for Nonconvex Economic Dispatch , 2008, IEEE Transactions on Power Systems.

[67] H. Dillon,et al. An international comparison of long‐term average speech spectra , 1994 .

[68] Theodoros Kostoulas,et al. Affective speech interface in serious games for supporting therapy of mental disorders , 2012, Expert Syst. Appl..

[69] José Martínez-Aroza,et al. The evaluation problem in discrete semi-hidden Markov models , 2017, Math. Comput. Simul..

[70] B AlonsoJesús,et al. New approach in quantification of emotional intensity from the speech signal , 2015 .

[71] Guihua Wen,et al. Weighted spectral features based on local Hu moments for speech emotion recognition , 2015, Biomed. Signal Process. Control..

[72] S. Mallat. A wavelet tour of signal processing , 1998 .

[73] John G Harris,et al. A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[74] Nilanjan Dey,et al. Indian Sign Language Recognition Using Optimized Neural Networks , 2015, ITITS.

[75] Amir Hossein Gandomi,et al. Bat algorithm for constrained optimization tasks , 2012, Neural Computing and Applications.

[76] Peter C. Tay,et al. Analysis of Stress in speech using adaptive Empirical Mode Decomposition , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[77] Bijaya Ketan Panigrahi,et al. Bacterial foraging optimisation: Nelder-Mead hybrid algorithm for economic load dispatch , 2008 .

[78] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[79] Qi Li,et al. An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[80] A. Gandomi. Interior search algorithm (ISA): a novel approach for global optimization. , 2014, ISA transactions.

[81] Kathleen M. Blee,et al. How Emotional Dynamics Maintain and Destroy White Supremacist Groups , 2018, Humanity & Society.

[82] Björn W. Schuller,et al. Speech emotion recognition , 2018, Commun. ACM.

[83] Rafael A. Calvo,et al. Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[84] Paul C. Bagshaw,et al. Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[85] A. Kaveh,et al. A novel heuristic optimization method: charged system search , 2010 .

[86] Ardeshir Bahreininejad,et al. Mine blast algorithm: A new population based algorithm for solving constrained engineering optimization problems , 2013, Appl. Soft Comput..

[87] Ratnadeep R. Deshmukh,et al. Development of Isolated Marathi Words Emotional Speech Database , 2014 .

[88] Björn W. Schuller,et al. Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[89] Samarendra Dandapat,et al. A novel breathiness feature for analysis and classification of speech under stress , 2015, 2015 Twenty First National Conference on Communications (NCC).

[90] Mike Brookes,et al. A Pitch Estimation Filter robust to high levels of noise (PEFAC) , 2011, 2011 19th European Signal Processing Conference.

[91] Wootaek Lim,et al. Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[92] José Rui Figueira,et al. A real-integer-discrete-coded particle swarm optimization for design problems , 2011, Appl. Soft Comput..

[93] Marcos Faúndez-Zanuy,et al. On Automatic Diagnosis of Alzheimer’s Disease Based on Spontaneous Speech Analysis and Emotional Temperature , 2013, Cognitive Computation.

[94] Björn W. Schuller,et al. Exploitation of Phase-Based Features for Whispered Speech Emotion Recognition , 2016, IEEE Access.

[95] Hajir Karimi,et al. Application of artificial neural network–genetic algorithm (ANN–GA) to correlation of density in nanofluids , 2012 .

[96] L. Darrell Whitley,et al. Genetic algorithms and neural networks: optimizing connections and connectivity , 1990, Parallel Comput..

[97] Fabien Ringeval,et al. Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models , 2017, IEEE Transactions on Affective Computing.

[98] Ismail Shahin,et al. Novel third-order hidden Markov models for speaker identification in shouted talking environments , 2014, Eng. Appl. Artif. Intell..

[99] Eduardo Coutinho,et al. Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition , 2019, IEEE Transactions on Multimedia.

[100] Kuansan Wang,et al. Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[101] Ragini Verma,et al. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech , 2015, Comput. Speech Lang..

[102] P. Ekman. An argument for basic emotions , 1992 .

[103] Eduardo Coutinho,et al. Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[104] Iain R. Murray,et al. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[105] P. Boersma. ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[106] Zuhairi Baharudin,et al. A comparative analysis of PSO and LM based NN short term load forecast with exogenous variables for smart power generation , 2014, 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS).