Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts

Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. Linguistic features, mainly from parsers, have been used to detect MCI, but this is not suitable for large-scale assessments. MCI disfluencies produce non-grammatical speech that requires manual or high precision automatic correction of transcripts. In this paper, we modeled transcripts into complex networks and enriched them with word embedding (CNE) to better represent short texts produced in neuropsychological assessments. The network measurements were applied with well-known classifiers to automatically identify MCI in transcripts, in a binary classification task. A comparison was made with the performance of traditional approaches using Bag of Words (BoW) and linguistic features for three datasets: DementiaBank in English, and Cinderella and Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using only complex networks, while Support Vector Machine was superior to other classifiers. CNE provided the highest accuracies for DementiaBank and Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably owing to its short narratives. The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised. Taken together, the results indicate that complex networks enriched with embedding is promising for detecting MCI in large-scale assessments

[1]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[2]  Haitao Liu,et al.  Approaching human language with complex networks. , 2014, Physics of life reviews.

[3]  Kathleen C. Fraser,et al.  Linguistic Features Identify Alzheimer's Disease in Narrative Speech. , 2015, Journal of Alzheimer's disease : JAD.

[4]  G. Tapang,et al.  PROSE AND POETRY CLASSIFICATION AND BOUNDARY DETECTION USING WORD ADJACENCY NETWORK ANALYSIS , 2010 .

[5]  Gábor Gosztolya,et al.  Detecting Mild Cognitive Impairment by Exploiting Linguistic Information from Transcripts , 2016, ACL.

[6]  Sandra M. Aluísio,et al.  Evaluating Progression of Alzheimer's Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese , 2016, PROPOR.

[7]  I. T. Draper THE ASSESSMENT OF APHASIA AND RELATED DISORDERS , 1973 .

[8]  Luciano da Fontoura Costa,et al.  Using complex networks for text classification: Discriminating informative and imaginative documents , 2016 .

[9]  Diego R. Amancio,et al.  A Complex Network Approach to Stylometry , 2015, PloS one.

[10]  Myrna F. Schwartz,et al.  The quantitative analysis of agrammatic production: Procedure and data , 1989, Brain and Language.

[11]  Brian Roark,et al.  Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment , 2012, INTERSPEECH.

[12]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[15]  Diego R. Amancio,et al.  Authorship recognition via fluctuation analysis of network topology and word intermittency , 2015, ArXiv.

[16]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[17]  Camila Vieira Ligo Teixeira,et al.  Non-pharmacological interventions on cognitive functions in older people with mild cognitive impairment (MCI). , 2012, Archives of gerontology and geriatrics.

[18]  J. Hodges,et al.  Non-verbal semantic impairment in semantic dementia , 2000, Neuropsychologia.

[19]  Luciano da Fontoura Costa,et al.  Supplementary Information-Identification of Literary Movements Using Complex Networks to Represent Texts , 2012 .

[20]  Sandra M. Aluísio,et al.  Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks , 2016, EACL.

[21]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[22]  R. Petersen Mild cognitive impairment as a diagnostic entity , 2004, Journal of internal medicine.

[23]  J. Becker,et al.  The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis. , 1994, Archives of neurology.

[24]  Diego R. Amancio,et al.  Authorship attribution based on Life-Like Network Automata , 2016, PloS one.

[25]  Eric Yeh,et al.  Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer's Disease , 2010, Brain Informatics.

[26]  Reinhard Köhler,et al.  Patterns in syntactic dependency networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[28]  Diego R. Amancio,et al.  Probing the Topological Properties of Complex Networks Modeling Short Written Texts , 2014, PloS one.

[29]  Luciano da Fontoura Costa,et al.  Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript , 2013, PloS one.

[30]  Thomas Kirste,et al.  Predicting Prodromal Alzheimer's Disease in Subjects with Mild Cognitive Impairment Using Machine Learning Classification of Multimodal Multicenter Diffusion‐Tensor and Magnetic Resonance Imaging Data , 2015, Journal of neuroimaging : official journal of the American Society of Neuroimaging.

[31]  Luciano da Fontoura Costa,et al.  Extractive summarization using complex networks and syntactic dependency , 2012 .

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Brian Roark,et al.  Alignment of spoken narratives for automated neuropsychological assessment , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[34]  E. Kaplan,et al.  The Boston naming test , 2001 .

[35]  Sandra M. Aluísio,et al.  Automatic Proposition Extraction from Dependency Trees: Helping Early Prediction of Alzheimer's Disease from Narratives , 2015, 2015 IEEE 28th International Symposium on Computer-Based Medical Systems.

[36]  Diego R. Amancio,et al.  Word sense disambiguation via high order of learning in complex networks , 2012, ArXiv.

[37]  B. Miller,et al.  Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse , 2014, Cortex.

[38]  Luciano da Fontoura Costa,et al.  Unveiling the relationship between complex networks metrics and word senses , 2012, ArXiv.

[39]  Brian Roark,et al.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Colleen Richey,et al.  Aided diagnosis of dementia type through computer-based analysis of spontaneous speech , 2014, CLPsych@ACL.

[41]  Sylvester Olubolu Orimaye,et al.  Learning Predictive Linguistic Features for Alzheimer’s Disease and related Dementias using Verbal Utterances , 2014, CLPsych@ACL.

[42]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[43]  Weerasak Muangpaisan,et al.  Prevalence of potentially reversible conditions in dementia and mild cognitive impairment in a geriatric clinic , 2012, Geriatrics & gerontology international.

[44]  Kathleen C. Fraser,et al.  Automated classification of primary progressive aphasia subtypes from narrative speech transcripts , 2014, Cortex.

[45]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[46]  Maureen Schmitter-Edgecombe,et al.  Neuropsychological test selection for cognitive impairment classification: A machine learning approach , 2015, Journal of clinical and experimental neuropsychology.

[47]  Steven Skiena,et al.  Inducing Language Networks from Continuous Space Word Representations , 2014, CompleNet.