Wavelet packet analysis for speaker-independent emotion recognition

Abstract Extracting effective features from speech signals is essential to recognize different emotions. Recent studies have demonstrated that wavelet analysis is a useful technique in signal processing. In this study, we extract emotion features using wavelet packet analysis from speech signals for speaker-independent emotion recognition. We explore and evaluate these features from two databases, i.e., EMODB and EESDB. It is found that the extracted features are effective for recognizing various speech emotions. Furthermore, compared with common features such as Mel-Frequency Cepstral Coefficients (MFCC), these features can improve the recognition rates by 14.9 and 4.3 percentages on EMODB and EESDB, respectively.

[1]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[2]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[3]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Ming Liu,et al.  Sensor-based human activity recognition system with a multilayered model using time series shapelets , 2015, Knowl. Based Syst..

[6]  Ning An,et al.  Speech emotion recognition based on wavelet packet coefficient model , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[7]  Jesús Francisco Vargas-Bonilla,et al.  Wavelet-Based Time-Frequency Representations for Automatic Recognition of Emotions from Speech , 2016, ITG Symposium on Speech Communication.

[8]  Yasser Ghanbari,et al.  A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets , 2006, Speech Commun..

[9]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Rosângela Coelho,et al.  Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification , 2014, IEEE Signal Processing Letters.

[11]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Wei-Ping Zhu,et al.  Speech Enhancement Based on Student $t$ Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients and a Custom Thresholding Function , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[14]  Huimin Lu,et al.  Facial Emotion Recognition Based on Biorthogonal Wavelet Entropy, Fuzzy Support Vector Machine, and Stratified Cross Validation , 2016, IEEE Access.

[15]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[16]  Sazali Yaacob,et al.  Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals , 2015, PloS one.

[17]  Florian Eyben,et al.  Towards a standard set of acoustic features for the processing of emotion in speech. , 2010 .

[18]  Aurobinda Routray,et al.  Databases, features and classifiers for speech emotion recognition: a review , 2018, International Journal of Speech Technology.

[19]  Rodrigo Capobianco Guido Paraconsistent Feature Engineering [Lecture Notes] , 2019, IEEE Signal Processing Magazine.

[20]  Yanning Zhang,et al.  Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[21]  Shu Wang,et al.  Learning structures of interval-based Bayesian networks in probabilistic generative model for human complex activity recognition , 2018, Pattern Recognit..

[22]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[23]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[24]  Shrikanth S. Narayanan,et al.  Discriminative Wavelet Packet Filter Bank Selection for Pattern Recognition , 2009, IEEE Transactions on Signal Processing.

[25]  Shu Wang,et al.  A framework of mining semantic-based probabilistic event relations for complex activity recognition , 2017, Inf. Sci..

[26]  Fabio Paternò,et al.  Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema , 2012, International Journal of Speech Technology.

[27]  Wan Khairunizam,et al.  Effectiveness of Tuned Q-factor Wavelet Transform in Emotion Recognition Among Left-brain Damaged Stroke Patients , 2019, International journal of simulation: systems, science & technology.

[28]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[29]  Björn W. Schuller,et al.  Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Sazali Yaacob,et al.  Improved binary dragonfly optimization algorithm and wavelet packet based non-linear features for infant cry classification , 2018, Comput. Methods Programs Biomed..

[31]  Chloé Clavel,et al.  Fear-type emotion recognition for future audio-based surveillance systems , 2008, Speech Commun..

[32]  Wei-Ping Zhu,et al.  Rayleigh modeling of teager energy operated perceptual wavelet packet coefficients for enhancing noisy speech , 2017, Speech Commun..

[33]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[34]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[35]  N. P. Guhan Seshadri,et al.  Music induced emotion using wavelet packet decomposition - An EEG study , 2018, Biomed. Signal Process. Control..

[36]  K. Deergha Rao,et al.  Discrete Wavelet Transforms , 2018 .

[37]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[38]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[39]  Anil Kumar,et al.  The optimized wavelet filters for speech compression , 2012, International Journal of Speech Technology.