Automatic Separation of Various Disease Types by Correlation Structure of Time Shifted Speech Features

Special disease types may affect the complex mechanisms of speech production in different ways, causing various speech disorders. This is the reason why extraction of biomarkers from speech could be reliable indicators of those diseases. The present paper aims to separate healthy speech samples and different groups of disordered speech of patients with various disease types, namely depression, Parkinson, morphological alteration of vocal organs, functional dysphonia and recurrent paresis. The correlation matrices of the time shifted values of formant frequencies (F1, F2, F3), mel-filter band energy values, mel-frequency cepstral coefficients (MFCCs), fundamental frequency (F0) and intensity were used as input for the classification of the diseases. Support vector machines and k-nearest neighbor methods were utilized to compare performances. In six-class classification experiment, the best overall accuracy was 54.75%, and the accuracy was 77.59% using re-categorization of disorders into four classes. Based on the achieved results, a speech-based diagnostic tool can be created that helps clinical staff by giving them a novel marker for diagnosis.

[1]  Meysam Asgari,et al.  Improvements to harmonic model for extracting better speech features in clinical applications , 2018, Comput. Speech Lang..

[2]  Klára Vicsi,et al.  Phonetic-class based correlation analysis for severity of dysphonia , 2017, 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[3]  Marcos Faúndez-Zanuy,et al.  Perceptual Features as Markers of Parkinson's Disease: The Issue of Clinical Interpretability , 2022, Recent Advances in Nonlinear Speech Processing.

[4]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[5]  Klára Vicsi,et al.  Automatic Detection of Voice Disorders , 2015, SLSP.

[6]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[7]  J. Lépine,et al.  The increasing burden of depression , 2011, Neuropsychiatric disease and treatment.

[8]  Klára Vicsi,et al.  Estimating the Severity of Parkinson's Disease Using Voiced Ratio and Nonlinear Parameters , 2016, SLSP.

[9]  Satrajit S. Ghosh,et al.  Segment-dependent dynamics in predicting parkinson's disease , 2015, INTERSPEECH.

[10]  M. Hoehn,et al.  Parkinsonism , 1967, Neurology.

[11]  Isabel Guimarães,et al.  Automatic Detection of Parkinson's Disease: An Experimental Analysis of Common Speech Production Tasks Used for Diagnosis , 2017, TSD.

[12]  A. Rauhut,et al.  Classification of voice qualities , 1986 .

[13]  Klára Vicsi,et al.  Statistical Analysis of Acoustical Parameters in the Voice of Children with Juvenile Dysphonia , 2016, SPECOM.

[14]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[15]  Thomas F. Quatieri,et al.  Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing , 2014, AVEC '14.

[16]  Pawel Strumillo,et al.  Application of Mel Cepstral Representation of Voice Recordings for Diagnosing Vocal Disorders , 2012 .

[17]  Thomas F. Quatieri,et al.  Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI , 2014, INTERSPEECH.

[18]  Thomas F. Quatieri,et al.  Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers , 2014, INTERSPEECH.

[19]  Guozhen An,et al.  Automatic recognition of unified parkinson's disease rating from speech with acoustic, i-vector and phonotactic features , 2015, INTERSPEECH.

[20]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[21]  Carlos J. Perez,et al.  Addressing voice recording replications for Parkinson's disease detection , 2016, Expert Syst. Appl..

[22]  R. Kessler,et al.  The epidemiology of depression across cultures. , 2013, Annual review of public health.

[23]  James R. Williamson,et al.  Seizure prediction using EEG spatiotemporal correlation structure , 2012, Epilepsy & Behavior.

[24]  Gábor Kiss,et al.  Mono- and multi-lingual depression prediction based on speech processing , 2017, International Journal of Speech Technology.

[25]  J. Verbeek,et al.  Systematic review of the treatment of functional dysphonia and prevention of voice disorders. , 2008, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.