Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit

The paper presents a set of experiments on pathological voice detection over the Saarbrucken Voice Database (SVD) by using the MultiFocal toolkit for a discriminative calibration and fusion. The SVD is freely available online containing a collection of voice recordings of different pathologies, including both functional and organic. A generative Gaussian mixture model trained with mel-frequency cepstral coefficients, harmonics-to-noise ratio, normalized noise energy and glottal-to-noise excitation ratio, is used as classifier. Scores are calibrated to increase performance at the desired operating point. Finally, the fusion of different recordings for each speaker, in which vowels /a/, /i/ and /u/ are pronounced with normal, low, high, and low-high-low intonations, offers a great increase in the performance. Results are compared with the Massachusetts Eye and Ear Infirmary (MEEI) database, which makes possible to see that SVD is much more challenging.

[1]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[2]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[3]  Antanas Verikas,et al.  Automated speech analysis applied to laryngeal disease categorization , 2008, Comput. Methods Programs Biomed..

[4]  Yannis Stylianou,et al.  On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices , 2011, Logopedics, phoniatrics, vocology.

[5]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[6]  M. Wieringa,et al.  The relationship between perceptual evaluation and objective multiparametric evaluation of dysphonia severity. , 2008, Journal of voice : official journal of the Voice Foundation.

[7]  平野 実 Clinical examination of voice , 1981 .

[8]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[9]  P Carding,et al.  Formal perceptual evaluation of voice quality in the United Kingdom , 2000, Logopedics, phoniatrics, vocology.

[10]  Ryszard Tadeusiewicz,et al.  APPLICATION OF NEURAL NETWORKS AND PATTERN RECOGNITION METHODS TO THE EVALUATION OF SPEECH DEFORMATION DEGREE FOR PATIENTS SURGICALLY TREATED FOR LARY NX CANCER , 1998 .

[11]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[12]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[13]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[14]  Nicolás Sáenz Lechón Contribuciones metodológicas para la evaluación objetiva de patologías laríngeas a partir del análisis acústico de la voz en diferentes escenarios de producción , 2010 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[17]  F. Cruz-Roldan,et al.  Automatic Assessment of Voice Quality According to the GRBAS Scale , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[18]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[19]  G Molenberghs,et al.  The dysphonia severity index: an objective measure of vocal quality based on a multiparameter approach. , 2000, Journal of speech, language, and hearing research : JSLHR.

[20]  Jack J. Jiang,et al.  Nonlinear dynamic analysis in signal typing of pathological human voices , 2003 .

[21]  Yu Zhang,et al.  Nonlinear dynamic analysis of speech from pathological subjects , 2002 .

[22]  Maria Markaki,et al.  Using modulation spectra for voice pathology detection and classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[23]  L. Gavidia-Ceballos,et al.  Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection , 1996, IEEE Transactions on Biomedical Engineering.