Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech

Alzheimer's Dementia (AD) is an incurable, debilitating, and progressive neurodegenerative condition that affects cognitive function. Early diagnosis is important as therapeutics can delay progression and give those diagnosed vital time. Developing models that analyse spontaneous speech could eventually provide an efficient diagnostic modality for earlier diagnosis of AD. The Alzheimer's Dementia Recognition through Spontaneous Speech task offers acoustically pre-processed and balanced datasets for the classification and prediction of AD and associated phenotypes through the modelling of spontaneous speech. We exclusively analyse the supplied textual transcripts of the spontaneous speech dataset, building and comparing performance across numerous models for the classification of AD vs controls and the prediction of Mental Mini State Exam scores. We rigorously train and evaluate Support Vector Machines (SVMs), Gradient Boosting Decision Trees (GBDT), and Conditional Random Fields (CRFs) alongside deep learning Transformer based models. We find our top performing models to be a simple Term Frequency-Inverse Document Frequency (TF-IDF) vectoriser as input into a SVM model and a pre-trained Transformer based model `DistilBERT' when used as an embedding layer into simple linear models. We demonstrate test set scores of 0.81-0.82 across classification metrics and a RMSE of 4.58.

[1]  V. Manera,et al.  Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease , 2015, Alzheimer's & dementia.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[4]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[9]  Zhiqiang Guo,et al.  Detecting Alzheimer's Disease from Continuous Speech Using Language Models. , 2019, Journal of Alzheimer's disease : JAD.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Clifford R. Jack,et al.  Predicting Clinical Scores from Magnetic Resonance Scans in Alzheimer's Disease , 2010, NeuroImage.

[13]  Sang Won Seo,et al.  Prediction of cognitive impairment via deep learning trained with multi-center neuropsychological test data , 2019, BMC Medical Informatics and Decision Making.

[14]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[15]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Frank Rudzicz,et al.  Speech Recognition in Alzheimer's Disease and in its Assessment , 2016, INTERSPEECH.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Frank Rudzicz,et al.  Using linguistic features longitudinally to predict clinical scores for Alzheimer’s disease and related dementias , 2015, SLPAT@Interspeech.

[20]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[21]  H. Langerman,et al.  Alzheimer’s Disease – Why We Need Early Diagnosis , 2019, Degenerative neurological and neuromuscular disease.

[22]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[23]  Fasih Haider,et al.  Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge , 2020, INTERSPEECH.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Jeremy H. Clear,et al.  The British national corpus , 1993 .

[26]  J. Steen,et al.  Mini‐Mental State Examination subscores indicate visuomotor deficits in Alzheimer's disease patients: A cross‐sectional study in a Dutch population , 2014, Geriatrics & gerontology international.

[27]  R. Mayeux,et al.  Epidemiology of Alzheimer disease , 2011, Nature Reviews Neurology.