Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection

To assist the clinical diagnosis and treatment of neurological diseases that cause speech dysarthria such as Parkinson's disease (PD), it is of paramount importance to craft robust features which can be used to automatically discriminate between healthy and dysarthric speech. Since dysarthric speech of patients suffering from PD is breathy, semi-whispery, and is characterized by abnormal pauses and imprecise articulation, it can be expected that its spectro-temporal sparsity differs from the spectro-temporal sparsity of healthy speech. While we have recently successfully used temporal sparsity characterization for dysarthric speech detection, characterizing spectral sparsity poses the challenge of constructing a valid feature vector from signals with a different number of unaligned time frames. Further, although several non-parametric and parametric measures of sparsity exist, it is unknown which sparsity measure yields the best performance in the context of dysarthric speech detection. The objective of this paper is to demonstrate the advantages of spectro-temporal sparsity characterization for automatic dysarthric speech detection. To this end, we first provide a numerical analysis of the suitability of different non-parametric and parametric measures (i.e., $l_1$-norm, kurtosis, Shannon entropy, Gini index, shape parameter of a Chi distribution, and shape parameter of a Weibull distribution) for sparsity characterization. It is shown that kurtosis, the Gini index, and the parametric sparsity measures are advantageous sparsity measures, whereas the $l_1$-norm and entropy measures fail to robustly characterize the temporal sparsity of signals with a different number of time frames. Second, we propose to characterize the spectral sparsity of an utterance by initially time-aligning it to the same utterance uttered by a (arbitrarily selected) reference speaker using dynamic time warping. Experimental results on a Spanish database of healthy and dysarthric speech show that estimating the spectro-temporal sparsity using the Gini index or the parametric sparsity measures and using it as a feature in a support vector machine results in a high classification accuracy of 83.3%.

[1]  S. Skodda,et al.  Vowel articulation in Parkinson's disease. , 2011, Journal of voice : official journal of the Voice Foundation.

[2]  Roman Cmejla,et al.  Automatic Evaluation of Articulatory Disorders in Parkinson’s Disease , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Ina Kodrasi,et al.  Super-gaussianity of Speech Spectral Coefficients as a Potential Biomarker for Dysarthric Speech Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Jesús Francisco Vargas-Bonilla,et al.  New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease , 2014, LREC.

[5]  Elmar Nöth,et al.  Automatic Detection of Parkinson's Disease Based on Modulated Vowels , 2016, INTERSPEECH.

[6]  Zongben Xu,et al.  Image Inpainting by Patch Propagation Using Patch Sparsity , 2010, IEEE Transactions on Image Processing.

[7]  Hong Cheng,et al.  Sparsity-Induced Similarity Measure and Its Applications , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  G. Stebbins,et al.  Factor structure of the unified Parkinson's disease rating scale: Motor examination section , 1998, Movement disorders : official journal of the Movement Disorder Society.

[9]  Ina Kodrasi,et al.  Statistical Modeling of Speech Spectral Coefficients in Patients with Parkinson's Disease , 2018, ITG Symposium on Speech Communication.

[10]  Paul J Tuite,et al.  Parkinson's Disease and Movement Disorders: Diagnosis and Treatment Guidelines for the Practicing Physician , 2000 .

[11]  Juan Ignacio Godino-Llorente,et al.  Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's Disease , 2018, Appl. Soft Comput..

[12]  Jesús Francisco Vargas-Bonilla,et al.  Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease , 2015, INTERSPEECH.

[13]  Tieniu Tan,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[15]  Timo Gerkmann,et al.  Empirical Distributions of DFT-Domain Speech Coefficients Based on Estimated Speech Variances , 2010 .

[16]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[17]  Jesper Jensen,et al.  Log-spectral magnitude MMSE estimators under super-Gaussian densities , 2009, INTERSPEECH.

[18]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[19]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[20]  Gábor Kiss,et al.  Estimating the severity of parkinson's disease from speech using linear regression and database partitioning , 2015, INTERSPEECH.

[21]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[22]  E. Růžička,et al.  Imprecise vowel articulation as a potential early marker of Parkinson's disease: effect of speaking task. , 2013, The Journal of the Acoustical Society of America.

[23]  M. S. Bartlett,et al.  Statistical methods and scientific inference. , 1957 .

[24]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[25]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[26]  A. Tkacenko,et al.  Generalized kurtosis and applications in blind equalization of MIMO channels , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[27]  Iñaki Inza,et al.  Dealing with the evaluation of supervised classification algorithms , 2015, Artificial Intelligence Review.

[28]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[29]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[30]  Jennifer L. Spielman,et al.  Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech. , 2010, Journal of speech, language, and hearing research : JSLHR.

[31]  C. Tanner,et al.  Prevalence of Parkinson’s disease across North America , 2018, npj Parkinson's Disease.

[32]  Chunying Fang,et al.  Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization , 2017, Comput. Math. Methods Medicine.

[33]  Rick Chartrand,et al.  Shrinkage mappings and their induced penalty functions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Max A. Little,et al.  Novel Speech Signal Processing Algorithms for High-Accuracy Classification of Parkinson's Disease , 2012, IEEE Transactions on Biomedical Engineering.

[35]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Satrajit S. Ghosh,et al.  Segment-dependent dynamics in predicting parkinson's disease , 2015, INTERSPEECH.

[37]  Alex Acero,et al.  Statistical Modeling of the Speech Signal , 2010 .

[38]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[39]  H. Dalton The Measurement of the Inequality of Incomes , 1920 .

[40]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[41]  Jesús Francisco Vargas-Bonilla,et al.  Spectral and cepstral analyses for Parkinson's disease detection in Spanish vowels and words , 2015, Expert Syst. J. Knowl. Eng..

[42]  Max A. Little,et al.  Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[43]  Joseph R. Duffy,et al.  Motor Speech Disorders: Clues to Neurologic Diagnosis , 2000 .