Automated classification of primary progressive aphasia subtypes from narrative speech transcripts

In the early stages of neurodegenerative disorders, individuals may exhibit a decline in language abilities that is difficult to quantify with standardized tests. Careful analysis of connected speech can provide valuable information about a patient's language capacities. To date, this type of analysis has been limited by its time-consuming nature. In this study, we present a method for evaluating and classifying connected speech in primary progressive aphasia using computational techniques. Syntactic and semantic features were automatically extracted from transcriptions of narrative speech for three groups: semantic dementia (SD), progressive nonfluent aphasia (PNFA), and healthy controls. Features that varied significantly between the groups were used to train machine learning classifiers, which were then tested on held-out data. We achieved accuracies well above baseline on the three binary classification tasks. An analysis of the influential features showed that in contrast with controls, both patient groups tended to use words which were higher in frequency (especially nouns for SD, and verbs for PNFA). The SD patients also tended to use words (especially nouns) that were higher in familiarity, and they produced fewer nouns, but more demonstratives and adverbs, than controls. The speech of the PNFA group tended to be slower and incorporate shorter words than controls. The patient groups were distinguished from each other by the SD patients' relatively increased use of words which are high in frequency and/or familiarity.

[1]  R. Logie,et al.  Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words , 1980 .

[2]  K. Patterson,et al.  ‘Non-semantic’Aspects of Language in Semantic Dementia: As Normal as They’re Said to Be? , 2006, Neurocase.

[3]  Kirrie J. Ballard,et al.  Patterns of language decline in non-fluent primary progressive aphasia , 1997 .

[4]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[5]  Karalyn Patterson,et al.  Single Word Production in Nonfluent Progressive Aphasia , 1998, Brain and Language.

[6]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[7]  A. Rey L'examen psychologique dans les cas d'encéphalopathie traumatique. (Les problems.). , 1941 .

[8]  Karalyn Patterson,et al.  Anomia: A doubly typical signature of semantic dementia , 2008, Neuropsychologia.

[9]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[10]  Brian Roark,et al.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Julie S. Snowden,et al.  Variability in cognitive presentation of Alzheimer's disease , 2008, Cortex.

[12]  Jonathan M. Campbell,et al.  Peabody Picture Vocabulary Test , 2010 .

[13]  J. Hodges,et al.  When More Yields Less: Speaking and Writing Deficits in Nonfluent Progressive Aphasia , 2004, Neurocase.

[14]  G. Sampson Depth in English grammar , 1997, Journal of Linguistics.

[15]  Vanessa Taler,et al.  Language performance in Alzheimer's disease and mild cognitive impairment: A comparative review , 2008, Journal of clinical and experimental neuropsychology.

[16]  M. Weiner,et al.  Cognition and anatomy in three variants of primary progressive aphasia , 2004, Annals of neurology.

[17]  A. Rey Lexamen psychologique : Dans les cas d'encephalopathie traumatique (Les problemes) , 1941 .

[18]  Lotte Meteyard,et al.  The relation between content and structure in language production: An analysis of speech errors in semantic dementia , 2009, Brain and Language.

[19]  A. Hillis,et al.  Deterioration of naming nouns versus verbs in primary progressive aphasia , 2004, Annals of neurology.

[20]  Sally Andrews,et al.  From inkmarks to ideas : current issues in lexical processing , 2006 .

[21]  M. Mesulam,et al.  Dissociations between fluency and agrammatism in primary progressive aphasia , 2012, Aphasiology.

[22]  Myrna F. Schwartz,et al.  The quantitative analysis of agrammatic production: Procedure and data , 1989, Brain and Language.

[23]  Andrew Kertesz,et al.  Primary Progressive Aphasias and Their Contribution to the Contemporary Knowledge About the Brain-Language Relationship , 2011, Neuropsychology Review.

[24]  Matthew A. Lambon Ralph,et al.  Naming in semantic dementia—what matters? , 1998, Neuropsychologia.

[25]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Hintat Cheung,et al.  Competing complexity metrics and adults' production of complex sentences , 1992, Applied Psycholinguistics.

[28]  Karalyn Patterson,et al.  Making sense of progressive non-fluent aphasia: an analysis of conversational speech. , 2009, Brain : a journal of neurology.

[29]  Peter J. Nestor,et al.  Abnormalities of connected speech in semantic dementia vs Alzheimer's disease , 2012 .

[30]  Brian Avants,et al.  Non-fluent speech in frontotemporal lobar degeneration , 2009, Journal of Neurolinguistics.

[31]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[32]  J. Raven Coloured progressive matrices : sets A, Ab, B , 1956 .

[33]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[34]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[35]  S. Cappa,et al.  Action and object naming in frontotemporal dementia, progressive supranuclear palsy, and corticobasal degeneration. , 2006, Neuropsychology.

[36]  Klaus Zechner,et al.  Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech , 2011, ACL.

[37]  Alan C. Evans,et al.  Spatial patterns of cortical thinning in mild cognitive impairment and Alzheimer's disease. , 2006, Brain : a journal of neurology.

[38]  E. Kaplan,et al.  The Boston naming test , 2001 .

[39]  S. Folstein,et al.  "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. , 1975, Journal of psychiatric research.

[40]  David Caplan,et al.  Localization of Syntactic Comprehension by Positron Emission Tomography , 1998, NeuroImage.

[41]  L. Rapport,et al.  Validation of the Warrington theory of visual processing and the Visual Object and Space Perception Battery. , 1998, Journal of clinical and experimental neuropsychology.

[42]  H. H. Clark,et al.  Using uh and um in spontaneous speaking , 2002, Cognition.

[43]  Peter Garrard,et al.  Abnormal discourse in semantic dementia: A data-driven approach , 2010, Neurocase.

[44]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[45]  Eric Yeh,et al.  Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer's Disease , 2010, Brain Informatics.

[46]  Matthew A. Lambon Ralph,et al.  The Rise and Fall of Frequency and Imageability: Noun and Verb Production in Semantic Dementia , 2000, Brain and Language.

[47]  M. MacDonald,et al.  Sweet nothings: Narrative speech in semantic dementia , 2010 .

[48]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[49]  M. Schwartz,et al.  Semantic Factors in Verb Retrieval: An Effect of Complexity , 1998, Brain and Language.

[50]  H. Stadthagen-González,et al.  The Bristol norms for age of acquisition, imageability, and familiarity , 2006, Behavior research methods.

[51]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[52]  Dimitra Vergyri,et al.  Learning diagnostic models using speech and language measures , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[53]  Olav M. Kvalheim,et al.  Interpretation of partial least squares regression models by means of target projection and selectivity ratio plots , 2010 .

[54]  Victor H. Yngve,et al.  A model and an hypothesis for language structure , 1960 .

[55]  B. Miller,et al.  Classification of primary progressive aphasia and its variants , 2011, Neurology.

[56]  Russell A. Poldrack,et al.  Decoding Continuous Variables from Neuroimaging Data: Basic and Clinical Applications , 2011, Front. Neurosci..

[57]  J. Gee,et al.  Speech errors in progressive non-fluent aphasia , 2010, Brain and Language.

[58]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[59]  A. Kaplan,et al.  A Beginner's Guide to Partial Least Squares Analysis , 2004 .

[60]  Gert Cauwenberghs,et al.  Neuromorphic Silicon Neuron Circuits , 2011, Front. Neurosci.

[61]  G. Kavé,et al.  Structurally well-formed narrative production in the face of severe conceptual deterioration: A longitudinal case study of a woman with semantic dementia , 2007, Journal of Neurolinguistics.

[62]  Serguei V. S. Pakhomov,et al.  Computerized Analysis of Speech and Language to Identify Psycholinguistic Correlates of Frontotemporal Lobar Degeneration , 2010, Cognitive and behavioral neurology : official journal of the Society for Behavioral and Cognitive Neurology.

[63]  Maria Luisa Gorno-Tempini,et al.  Connected speech production in three variants of primary progressive aphasia. , 2010, Brain : a journal of neurology.

[64]  J R Hodges,et al.  Semantic knowledge and episodic memory for faces in semantic dementia. , 2001, Neuropsychology.