Analysis and Classification of Cold Speech Using Variational Mode Decomposition

This paper presents analysis and classification of a pathological speech called cold speech, which is recorded when the person is suffering from common cold. Nose and throat are affected by the common cold. As nose and throat play an important role in speech production, the speech characteristics are altered during this pathology. In this work, variational mode decomposition (VMD) is used for analysis and classification of cold speech. VMD decomposes the speech signal into a number of sub-signals or modes. These sub-signals may better exploit the pathological information for characterization of cold speech. Various statistics, mean, variance, kurtosis and skewness are extracted from each of the decomposed sub-signals. Along with those statistics, center frequency, energy, peak amplitude, spectral entropy, permutation entropy and Renyi's entropy are evaluated, and used as features. Mutual information (MI) is further employed to assign the weight values to the features. In terms of classification rates, the proposed feature outperforms the linear prediction coefficients (LPC), mel frequency cepstral coefficients (MFCC), Teager energy operator (TEO) based feature and ComParE feature sets (IS09-emotion and IS13-ComParE). The proposed feature shows an average recognition rate of 90.02 percent for IITG cold speech database and 66.84 percent for URTIC database.

[1]  Albert Ali Salah,et al.  Random Discriminative Projection Based Feature Selection with Application to Conflict Recognition , 2015, IEEE Signal Processing Letters.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Dominique Zosso,et al.  Variational Mode Decomposition , 2014, IEEE Transactions on Signal Processing.

[4]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[5]  Salim Lahmiri,et al.  Physiological signal denoising with variational mode decomposition and weighted reconstruction after DWT thresholding , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  G. M. Allan,et al.  Prevention and treatment of the common cold: making sense of the evidence , 2014, Canadian Medical Association Journal.

[8]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[9]  Y. Koike Vowel amplitude modulations in patients with laryngeal diseases. , 1969, The Journal of the Acoustical Society of America.

[10]  Björn W. Schuller,et al.  Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation , 2017, INTERSPEECH.

[11]  Róbert Busa-Fekete,et al.  Assessing the degree of nativeness and parkinson's condition using Gaussian processes and deep rectifier neural networks , 2015, INTERSPEECH.

[12]  U. Rajendra Acharya,et al.  Application of Entropy Measures on Intrinsic Mode Functions for the Automated Identification of Focal Electroencephalogram Signals , 2015, Entropy.

[13]  S. Iwata,et al.  Periodicities of pitch perturbations in normal and pathologic larynges , 1972, The Laryngoscope.

[14]  Marcelo de Oliveira Rosa,et al.  Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[15]  Róbert Busa-Fekete,et al.  Detecting autism, emotions and social signals using adaboost , 2013, INTERSPEECH.

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  R. Eccles,et al.  Understanding the symptoms of the common cold and influenza , 2005, The Lancet Infectious Diseases.

[18]  George Trigeorgis,et al.  The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring , 2017, INTERSPEECH.

[19]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[20]  Samarendra Dandapat,et al.  Fourier model based features for analysis and classification of out-of-breath speech , 2017, Speech Commun..

[21]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[22]  R. Plant,et al.  Analysis of Voice Changes After Thyroplasty Using Linear Predictive Coding , 1997, The Laryngoscope.

[23]  Okko Johannes Räsänen,et al.  Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits , 2015, Comput. Speech Lang..

[24]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[25]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[26]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[27]  Samarendra Dandapat,et al.  Detection of Shockable Ventricular Arrhythmia using Variational Mode Decomposition , 2016, Journal of Medical Systems.

[28]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[29]  Mohammed Bennamoun,et al.  An Automatic Framework for Textured 3D Video-Based Facial Expression Recognition , 2014, IEEE Transactions on Affective Computing.

[30]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[31]  Edgard Afonso Lamounier,et al.  Assessment of laryngeal disorders through the global energy of speech , 2011, IEEE Latin America Transactions.

[32]  Yanbin Li,et al.  Topology Inference With Network Tomography Based on t-Test , 2014, IEEE Communications Letters.

[33]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[34]  Germán Castellanos-Domínguez,et al.  Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients , 2011, IEEE Transactions on Biomedical Engineering.

[35]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[36]  Ming Li,et al.  End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum , 2017, INTERSPEECH.

[37]  Eduardo Castillo Guerra,et al.  Automatic Modeling of Acoustic Perception of Breathiness in Pathological Voices , 2009, IEEE Transactions on Biomedical Engineering.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  G. Ouyang,et al.  Predictability analysis of absence seizures with permutation entropy , 2007, Epilepsy Research.

[40]  Miguel Angel Ferrer-Ballester,et al.  Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Bin Ma,et al.  An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN , 2017, INTERSPEECH.

[42]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[43]  Prasanta Kumar Ghosh,et al.  Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition , 2017, INTERSPEECH.

[44]  D. Berry,et al.  Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. , 1994, The Journal of the Acoustical Society of America.

[45]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[46]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[47]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[48]  Charles R. Larson,et al.  Cepstral analysis of ‘‘cold‐speech’’ for speaker recognition: A second look , 1996 .

[49]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[50]  R. Tyrrell Rockafellar,et al.  A dual approach to solving nonlinear programming problems by unconstrained optimization , 1973, Math. Program..

[51]  András Beke,et al.  It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge , 2017, INTERSPEECH.

[52]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[53]  Raymond N. J. Veldhuis,et al.  Extraction of vocal-tract system characteristics from speech signals , 1998, IEEE Trans. Speech Audio Process..

[54]  Yannis Stylianou,et al.  Voice Pathology Detection and Discrimination Based on Modulation Spectral Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.