Multiscale System for Alzheimer's Dementia Recognition Through Spontaneous Speech

This paper describes the Verisk submission to The ADReSS Challenge [1]. We analyze the text data at both the word level and phoneme level, which leads to our best-performing system in combination with audio features. Thus, the system is both multi-modal (audio and text) and multi-scale (word and phoneme levels). Experiments with larger neural language models did not result in improvement, given the small amount of text data available. By contrast, the phoneme representation has a vocabulary size of only 66 tokens and could be trained from scratch on the present data. Therefore, we believe this method to be useful in cases of limited text data, as in many medical settings.

[1]  Kathleen C. Fraser,et al.  Multilingual word embeddings for the assessment of narrative speech in mild cognitive impairment , 2019, Comput. Speech Lang..

[2]  Zheng-Hua Tan,et al.  rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method , 2020, Comput. Speech Lang..

[3]  Heidi Christensen,et al.  Detecting Signs of Dementia Using Word Vector Representations , 2018, INTERSPEECH.

[4]  K. Scherer,et al.  On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common , 2013, Front. Psychol..

[5]  Mohit Bansal,et al.  Detecting Linguistic Characteristics of Alzheimer’s Dementia by Interpreting Neural Models , 2018, NAACL.

[6]  Heidi Christensen,et al.  Simple and robust audio-based detection of biomarkers for Alzheimer's disease , 2016 .

[7]  Dolores E. López,et al.  Speech in Alzheimer's Disease: Can Temporal and Acoustic Parameters Discriminate Dementia? , 2014, Dementia and Geriatric Cognitive Disorders.

[8]  Kathleen C. Fraser,et al.  Linguistic Features Identify Alzheimer's Disease in Narrative Speech. , 2015, Journal of Alzheimer's disease : JAD.

[9]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[11]  Daoqiang Zhang,et al.  Multimodal classification of Alzheimer's disease and mild cognitive impairment , 2011, NeuroImage.

[12]  Sterling C. Johnson,et al.  Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer's Disease using structural MR and FDG-PET images , 2017, ArXiv.

[13]  Matteo Pagliardini,et al.  Better Word Embeddings by Disentangling Contextual n-Gram Information , 2019, NAACL.

[14]  Miguel Angel Ferrer-Ballester,et al.  Alzheimer's disease and automatic speech analysis: A review , 2020, Expert Syst. Appl..

[15]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[16]  Sylvester Olubolu Orimaye,et al.  Predicting probable Alzheimer’s disease using linguistic deficits and biomarkers , 2017, BMC Bioinformatics.

[17]  Kathleen C. Fraser,et al.  Automated classification of primary progressive aphasia subtypes from narrative speech transcripts , 2014, Cortex.

[18]  Ninon Burgos,et al.  Convolutional Neural Networks for Classification of Alzheimer's Disease: Overview and Reproducible Evaluation , 2019, Medical Image Anal..

[19]  Karalyn Patterson,et al.  Phonological and Articulatory Impairment in Alzheimer's Disease: A Case Series , 2000, Brain and Language.

[20]  Frank Rudzicz,et al.  Using linguistic features longitudinally to predict clinical scores for Alzheimer’s disease and related dementias , 2015, SLPAT@Interspeech.

[21]  Sylvester Olubolu Orimaye,et al.  Deep language space neural network for classifying mild cognitive impairment and Alzheimer-type dementia , 2018, PloS one.

[22]  DeLiang Wang,et al.  A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[24]  Fasih Haider,et al.  Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge , 2020, INTERSPEECH.

[25]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[26]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[27]  N. Schuff,et al.  Multimodal imaging in Alzheimer's disease: validity and usefulness for early detection , 2015, Lancet Neurology.

[28]  Theodoros Giannakopoulos pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[29]  J. Becker,et al.  The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis. , 1994, Archives of neurology.

[30]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[31]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[32]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[33]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[34]  Veronika Vincze,et al.  Speaking in Alzheimer’s Disease, is That an Early Sign? Importance of Changes in Language Abilities in Alzheimer’s Disease , 2015, Front. Aging Neurosci..