Estimation of Parkinson’s disease severity using speech features and extreme gradient boosting

In recent years, there is an increasing interest in building e-health systems. The systems built to deliver the health services with the use of internet and communication technologies aim to reduce the costs arising from outpatient visits of patients. Some of the related recent studies propose machine learning-based telediagnosis and telemonitoring systems for Parkinson's disease (PD). Motivated from the studies showing the potential of speech disorders in PD telemonitoring systems, in this study, we aim to estimate the severity of PD from voice recordings of the patients using motor Unified Parkinson's Disease Rating Scale (UPDRS) as the evaluation metric. For this purpose, we apply various speech processing algorithms to the voice signals of the patients and then use these features as input to a two-stage estimation model. The first step is to apply a wrapper-based feature selection algorithm, called Boruta, and select the most informative speech features. The second step is to feed the selected set of features to a decision tree-based boosting algorithm, extreme gradient boosting, which has been recently applied successfully in many machine learning tasks due to its generalization ability and speed. The feature selection analysis showed that the vibration pattern of the vocal fold is an important indicator of PD severity. Besides, we also investigate the effectiveness of using age and years passed since diagnosis as covariates together with speech features. The lowest mean absolute error with 3.87 was obtained by combining these covariates and speech features with prediction level fusion. Graphical Abstract Framework for the proposed UPDRS estimation model.

[1]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Athanasios Tsanas,et al.  Accurate telemonitoring of Parkinson's disease symptom severity using nonlinear speech signal processing and statistical machine learning , 2012 .

[4]  Max A. Little,et al.  Technology in Parkinson's disease: Challenges and opportunities , 2016, Movement disorders : official journal of the Movement Disorder Society.

[5]  U. Rajendra Acharya,et al.  Automated diagnosis of coronary artery disease using tunable-Q wavelet transform applied on heart rate signals , 2015, Knowl. Based Syst..

[6]  Krisztian Buza,et al.  ParkinsoNET: Estimation of UPDRS Score Using Hubness-Aware Feedforward Neural Networks , 2016, Appl. Artif. Intell..

[7]  I. Midi,et al.  Voice abnormalities and their relation with motor dysfunction in Parkinson’s disease , 2007, Acta neurologica Scandinavica.

[8]  Carlos J. Perez,et al.  Addressing voice recording replications for tracking Parkinson’s disease progression , 2016, Medical & Biological Engineering & Computing.

[9]  Riyad Ismail,et al.  Using Boruta-Selected Spectroscopic Wavebands for the Asymptomatic Detection of Fusarium Circinatum Stress , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[10]  Alan J. Miller Subset Selection in Regression , 1992 .

[11]  C. Tanner,et al.  Projected number of people with Parkinson disease in the most populous nations, 2005 through 2030 , 2007, Neurology.

[12]  David C. Atkins,et al.  Depression Screening from Voice Samples of Patients Affected by Parkinson’s Disease , 2019, Digital Biomarkers.

[13]  T. Vos,et al.  Global, regional, and national incidence and prevalence, and years lived with disability for 328 diseases and injuries in 195 countries, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016 , 2017 .

[14]  Abeer Alwan,et al.  The voice source in speech production: data, analysis and models , 2010 .

[15]  J. Friedman Stochastic gradient boosting , 2002 .

[16]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[17]  U. Rajendra Acharya,et al.  Characterization of coronary artery disease using flexible analytic wavelet transform applied on ECG signals , 2017, Biomed. Signal Process. Control..

[18]  Max A. Little,et al.  Using and understanding cross-validation strategies. Perspectives on Saeb et al. , 2017, GigaScience.

[19]  Konrad P. Kording,et al.  The need to approximate the use-case in clinical machine learning , 2017, GigaScience.

[20]  Witold R. Rudnicki,et al.  Boruta - A System for Feature Selection , 2010, Fundam. Informaticae.

[21]  N. Magdalinou,et al.  Clinical Features and Differential Diagnosis of Parkinson’s Disease , 2017 .

[22]  Raymond D. Kent,et al.  Acoustic and Intelligibility Characteristics of Sentence Production in Neurogenic Speech Disorders , 2000, Folia Phoniatrica et Logopaedica.

[23]  Jirí Mekyska,et al.  Selection of optimal parameters for automatic analysis of speech disorders in Parkinson's disease , 2011, 2011 34th International Conference on Telecommunications and Signal Processing (TSP).

[24]  Nitesh K. Poona,et al.  Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data , 2016, Applied spectroscopy.

[25]  L. Hartelius,et al.  Speech and swallowing symptoms associated with Parkinson's disease and multiple sclerosis: a survey. , 1994, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[26]  Max A. Little,et al.  Novel Speech Signal Processing Algorithms for High-Accuracy Classification of Parkinson's Disease , 2012, IEEE Transactions on Biomedical Engineering.

[27]  Gorkem Serbes,et al.  An emboli detection system based on Dual Tree Complex Wavelet Transform and ensemble learning , 2015, Appl. Soft Comput..

[28]  U. Rajendra Acharya,et al.  An integrated alcoholic index using tunable-Q wavelet transform based features extracted from EEG signals for diagnosis of alcoholism , 2017, Appl. Soft Comput..

[29]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[30]  Y Agid,et al.  Segmental progression of early untreated Parkinson’s disease: a novel approach to clinical rating , 2009, Journal of Neurology, Neurosurgery & Psychiatry.

[31]  Antanas Verikas,et al.  Detecting Parkinson’s disease from sustained phonation and speech signals , 2017, PloS one.

[32]  Yannis Stylianou,et al.  Dysphonia detection based on modulation spectral features and cepstral coefficients , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Sid Gilman,et al.  Development and validation of the Unified Multiple System Atrophy Rating Scale (UMSARS) , 2004, Movement disorders : official journal of the Movement Disorder Society.

[34]  R. Iansek,et al.  Speech impairment in a large sample of patients with Parkinson's disease. , 1998, Behavioural neurology.

[35]  P. Martínez-Martín,et al.  Unified Parkinson's disease rating scale characteristics and structure , 1994, Movement disorders : official journal of the Movement Disorder Society.

[36]  Jean Schoentgen,et al.  Time series analysis of jitter , 1995 .

[37]  J. Hindle,et al.  Ageing, neurodegeneration and Parkinson's disease. , 2010, Age and ageing.

[38]  A. Siderowf,et al.  Test–Retest reliability of the Unified Parkinson's Disease Rating Scale in patients with early Parkinson's disease: Results from a multicenter clinical trial , 2002, Movement disorders : official journal of the Movement Disorder Society.

[39]  Musaed Alhussein,et al.  Cloud based framework for Parkinson's disease diagnosis and monitoring system for remote healthcare applications , 2017, Future Gener. Comput. Syst..

[40]  Suchi Saria,et al.  Using Smartphones and Machine Learning to Quantify Parkinson Disease Severity: The Mobile Parkinson Disease Score , 2018, JAMA neurology.

[41]  J. Logemann,et al.  Frequency and cooccurrence of vocal tract dysfunctions in the speech of a large sample of Parkinson patients. , 1978, The Journal of speech and hearing disorders.

[42]  Gorkem Serbes,et al.  Wheeze type classification using non-dyadic wavelet transform based optimal energy ratio technique , 2019, Comput. Biol. Medicine.

[43]  S. Fahn Unified Parkinson's Disease Rating Scale, In : S. Fahn, CD. Marsden, DB. Calne, M. Goldstein, Recent Developments in Parkinson's Disease , 1987 .

[44]  Carl E Clarke,et al.  Speech and language therapy versus placebo or no intervention for speech problems in Parkinson's disease. , 2012, The Cochrane database of systematic reviews.

[45]  Lorene M Nelson,et al.  Incidence of Parkinson's disease: variation by age, gender, and race/ethnicity. , 2003, American journal of epidemiology.

[46]  Walter Maetzler,et al.  New methods for the assessment of Parkinson's disease (2005 to 2015): A systematic review , 2016, Movement disorders : official journal of the Movement Disorder Society.

[47]  N H Holford,et al.  Drug treatment effects on disease progression. , 2001, Annual review of pharmacology and toxicology.

[48]  Academisch Proefschrift,et al.  UvA-DARE ( Digital Academic Repository ) Clinimetrics , clinical profile and prognosis in early Parkinson ’ s disease , 2009 .

[49]  Masoud Salehi,et al.  Relation between Voice Handicap Index (VHI) and disease severity in Iranian patients with Parkinson's disease , 2012, Medical journal of the Islamic Republic of Iran.

[50]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[51]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[52]  P. Snyder,et al.  Variability in fundamental frequency during speech in prodromal and incipient Parkinson's disease: A longitudinal case study , 2004, Brain and Cognition.

[53]  Adrian P. Simpson,et al.  Phonetic differences between male and female speech , 2009, Lang. Linguistics Compass.

[54]  A. Stiggelbout,et al.  Systematic evaluation of rating scales for impairment and disability in Parkinson's disease , 2002, Movement disorders : official journal of the Movement Disorder Society.

[55]  D. Ashmead,et al.  The acoustic bases for gender identification from children's voices. , 2001, The Journal of the Acoustical Society of America.

[56]  C. Newman,et al.  The Voice Handicap Index (VHI)Development and Validation , 1997 .

[57]  Meysam Asgari,et al.  Fully automated assessment of the severity of Parkinson's disease from speech , 2015, Comput. Speech Lang..

[58]  G. Stebbins,et al.  Assuring interrater reliability for the UPDRS motor section: Utility of the UPDRS teaching tape , 2004, Movement disorders : official journal of the Movement Disorder Society.

[59]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[60]  Clayton R. Pereira,et al.  A recurrence plot-based approach for Parkinson's disease identification , 2019, Future Gener. Comput. Syst..

[61]  Huan Liu,et al.  Feature subset selection bias for classification learning , 2006, ICML.

[62]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[63]  E. Růžička,et al.  Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson's disease. , 2011, The Journal of the Acoustical Society of America.

[64]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[65]  Ivan W. Selesnick,et al.  Wavelet Transform With Tunable Q-Factor , 2011, IEEE Transactions on Signal Processing.

[66]  Jack J. Jiang,et al.  Phonatory impairment in Parkinson's disease: evidence from nonlinear dynamic analysis and perturbation analysis. , 2007, Journal of voice : official journal of the Voice Foundation.

[67]  Olcay Kursun,et al.  Telediagnosis of Parkinson’s Disease Using Measurements of Dysphonia , 2010, Journal of Medical Systems.

[68]  Ana Paula Zarzur,et al.  Laryngeal electromyography and acoustic voice analysis in Parkinson's disease: a comparative study. , 2010, Brazilian journal of otorhinolaryngology.

[69]  T. Dall,et al.  The current and projected economic burden of Parkinson's disease in the United States , 2013, Movement disorders : official journal of the Movement Disorder Society.

[70]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[71]  N. Arunkumar,et al.  Gait and tremor investigation using machine learning techniques for the diagnosis of Parkinson disease , 2018, Future Gener. Comput. Syst..

[72]  Yanru Zhang,et al.  A gradient boosting method to improve travel time prediction , 2015 .

[73]  M Richards,et al.  Interrater reliability of the unified Parkinson's disease rating scale motor examination , 1994, Movement disorders : official journal of the Movement Disorder Society.

[74]  A. S. Grove,et al.  Testing objective measures of motor impairment in early Parkinson's disease: Feasibility study of an at‐home testing device , 2009, Movement disorders : official journal of the Movement Disorder Society.

[75]  Anirvan Ghosh,et al.  Evaluation of smartphone‐based testing to generate exploratory outcome measures in a phase 1 Parkinson's disease clinical trial , 2018, Movement disorders : official journal of the Movement Disorder Society.

[76]  Robert L. McCormick,et al.  BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers , 2016 .

[77]  Max A. Little,et al.  Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity , 2011, Journal of The Royal Society Interface.

[78]  Jessica Granderson,et al.  Gradient boosting machine for modeling the energy consumption of commercial buildings , 2018 .

[79]  Fikret S. Gürgen,et al.  Collection and Analysis of a Parkinson Speech Dataset With Multiple Types of Sound Recordings , 2013, IEEE Journal of Biomedical and Health Informatics.

[80]  J. Jankovic Parkinson’s disease: clinical features and diagnosis , 2008, Journal of Neurology, Neurosurgery, and Psychiatry.

[81]  Aysegul Gunduz,et al.  A comparative analysis of speech signal processing algorithms for Parkinson's disease classification and the use of the tunable Q-factor wavelet transform , 2019, Appl. Soft Comput..

[82]  Xing Chen,et al.  EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association prediction , 2018, Cell Death & Disease.

[83]  Gorkem Serbes,et al.  Overcomplete discrete wavelet transform based respiratory sound discrimination with feature and decision level fusion , 2017, Biomed. Signal Process. Control..

[84]  Wieslaw Paja,et al.  Melanoma important features selection using random forest approach , 2013, 2013 6th International Conference on Human System Interactions (HSI).

[85]  Marcos Faúndez-Zanuy,et al.  Robust and complex approach of pathological speech signal analysis , 2015, Neurocomputing.

[86]  Antanas Verikas,et al.  Automated speech analysis applied to laryngeal disease categorization , 2008, Comput. Methods Programs Biomed..

[87]  S. Fahn Unified Parkinson's Disease Rating Scale , 1987 .