Normalized modulation spectral features for cross-database voice pathology detection

In this paper, we employ normalized modulation spectral analysis for voice pathology detection. Such normalization is important when there is a mismatch between training and testing conditions, or in other words, employing the detection system in real (testing) conditions. Modulation spectra usually produce a high-dimensionality space. For classification purposes, the size of the original space is reduced using Higher Order Singular Value Decomposition (SVD). Further, we select most relevant features based on the mutual information between subjective voice quality and computed features, which leads to an adaptive to the classification task modulation spectra representation. For voice pathology detection, the adaptive modulation spectra is combined with an SVM classifier. To simulate the real testing conditions; one for training and the other for testing. We address the difference of signal characteristics between training and testing data through subband normalization of modulation spectral features. Simulations show that feature normalization enables the cross-database detection of pathological voices even when training and test data are different.

[1]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[2]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[4]  Les E. Atlas,et al.  Feasibility of Single Channel Speaker Separation Based on Modulation Frequency Analysis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Yannis Stylianou,et al.  Dimensionality reduction of modulation frequency features for speech discrimination , 2008, INTERSPEECH.

[6]  Maria Markaki,et al.  Using modulation spectra for voice pathology detection and classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Douglas E. Sturim,et al.  Automatic dysphonia recognition using biologically-inspired amplitude-modulation features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Juan Ignacio Godino Llorente,et al.  Acoustic analysis of voice using WPCVox: a comparative study with Multi Dimensional Voice Program , 2008 .

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[11]  Marcelo de Oliveira Rosa,et al.  Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[12]  Anders Askenfelt,et al.  Speech , Music and Hearing Quarterly Progress and Status Report Speech waveform perturbation analysis revisited , 2007 .

[13]  Les E. Atlas,et al.  Modulation-scale analysis for content identification , 2004, IEEE Transactions on Signal Processing.

[14]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.

[15]  B. Walden,et al.  An evaluation of residue features as correlates of voice disorders. , 1987, Journal of communication disorders.

[16]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[17]  Evelyn Abberton,et al.  Hearing and phonetic criteria in voice measurement: Clinical applications , 2008, Logopedics, phoniatrics, vocology.

[18]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.