Selecting disorder-specific features for speech pathology fingerprinting

The general aim of this work is to learn a unique statistical signature for the state of a particular speech pathology. We pose this as a speaker identification problem for dysarthric individuals. To that end, we propose a novel algorithm for feature selection that aims to minimize the effects of speaker-specific features (e.g., fundamental frequency) and maximize the effects of pathology-specific features (e.g., vocal tract distortions and speech rhythm). We derive a cost function for optimizing feature selection that simultaneously trades off between these two competing criteria. Furthermore, we develop an efficient algorithm that optimizes this cost function and test the algorithm on a set of 34 dysarthric and 13 healthy speakers. Results show that the proposed method yields a set of features related to the speech disorder and not an individual's speaking style. When compared to other feature-selection algorithms, the proposed approach results in an improvement in a disorder fingerprinting task by selecting features that are specific to the disorder.

[1]  C. Sheard,et al.  Reliability and agreement of ratings of ataxic dysarthric speech samples with varying intelligibility. , 1991, Journal of speech and hearing research.

[2]  S. B. Davis,et al.  Evaluation of acoustic parameters for monosyllabic word identification , 1978 .

[3]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[4]  Stephanie A. Borrie,et al.  Perceptual learning of dysarthric speech: a review of experimental studies. , 2012, Journal of speech, language, and hearing research : JSLHR.

[5]  Monica McHenry,et al.  An exploration of listener variability in intelligibility judgments. , 2011, American journal of speech-language pathology.

[6]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Julie M Liss,et al.  The effects of familiarization on intelligibility and lexical segmentation in hypokinetic and ataxic dysarthria. , 2002, The Journal of the Acoustical Society of America.

[8]  Robert E. Yantorno,et al.  Performance of the modified Bark spectral distortion as an objective speech quality measure , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Stephen D. Voran,et al.  Objective estimation of perceived speech quality .II. Evaluation of the measuring normalizing block technique , 1999, IEEE Trans. Speech Audio Process..

[10]  Douglas D. O'Shaughnessy,et al.  Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[11]  Max A. Little,et al.  Using the cellular mobile telephone network to remotely monitor Parkinson ‟ s disease symptom severity , 2022 .

[12]  Marc S De Bodt,et al.  Intelligibility as a linear combination of dimensions in dysarthric speech. , 2002, Journal of communication disorders.

[13]  Fraser Shein,et al.  Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility , 2012, Speech Commun..

[14]  J. Liss,et al.  Discriminating dysarthria type from envelope modulation spectra. , 2010, Journal of speech, language, and hearing research : JSLHR.

[15]  Stephen D. Voran,et al.  Objective estimation of perceived speech quality. I. Development of the measuring normalizing block technique , 1999, IEEE Trans. Speech Audio Process..

[16]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.