Kullback-Leibler divergence and sample skewness for pathological voice quality assessment

Abstract This paper proposes new features aiming to improve the performance of an automatic voice pathology detection system. The features are designed precisely in terms of voice pathologies effects upon the speech signal. The system is intended to deliver high accuracy with a low number of parameters. Kullback–Leibler divergence (KLD) applied to consecutive frames of the speech signal provides a measure of voice instability. In this work, the KLD is applied to frame’s histogram and a modified form of its spectrum named higher amplitude suppression spectrum (HASS). The H-KLD (histogram KLD) and the HASS-KLD are two of the three features presently approached. An additional feature that provides the level of damping of the voice pitch period waveform is proposed, the short-term sample skewness of the signal. The H-KLD, the HASS-KLD, and the sample skewness are features employed along with mel-frequency cepstral coefficients (MFCC) in a voice pathology detection system. The system is composed of a Gaussian mixture models (GMM) classifier and two generalized extreme value (GEV) distribution classifiers. They are fused by means of a Gaussian naive Bayes classifier. A standard subset of the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database is adopted for evaluating the system. The obtained global success rate of 99.55% shows that the proposed features are suitable for pathological voice quality assessment.

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  Matías Zañartu,et al.  Modeling the effects of a posterior glottal opening on vocal fold dynamics with implications for vocal hyperfunction. , 2014, The Journal of the Acoustical Society of America.

[3]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[4]  Ghulam Muhammad,et al.  Automatic voice pathology detection and classification using vocal tract area irregularity , 2016 .

[5]  S. Nadarajah,et al.  Extreme Value Distributions: Theory and Applications , 2000 .

[6]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[7]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[8]  E. Yumoto,et al.  Harmonics-to-noise ratio and psychophysical measurement of the degree of hoarseness. , 1984, Journal of speech and hearing research.

[9]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[10]  Seiji Niimi,et al.  Vocal Fold Vibration and Voice Quality , 1999, Folia Phoniatrica et Logopaedica.

[11]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[12]  Muhammad Ghulam,et al.  Pathological voice detection and binary classification using MPEG-7 audio features , 2014, Biomed. Signal Process. Control..

[13]  Helen M. Hanson,et al.  Glottal characteristics of male speakers: acoustic correlates and comparison with female data. , 1996 .

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[15]  Muhammad Ghulam,et al.  Voice pathology detection using interlaced derivative pattern on glottal source excitation , 2017, Biomed. Signal Process. Control..

[16]  Svante Granqvist,et al.  Guidelines for selecting microphones for human voice production research. , 2010, American journal of speech-language pathology.

[17]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[18]  S. Coles,et al.  An Introduction to Statistical Modeling of Extreme Values , 2001 .

[19]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[20]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[21]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[22]  Germán Castellanos-Domínguez,et al.  Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients , 2011, IEEE Transactions on Biomedical Engineering.

[23]  Yonghong Yan,et al.  Discrimination between pathological and normal voices using GMM-SVM approach. , 2011, Journal of voice : official journal of the Voice Foundation.

[24]  Thomas Sikora,et al.  MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval , 2005 .

[25]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.