Comparison of different feature sets for identification of variants in progressive aphasia

We use computational techniques to extract a large number of different features from the narrative speech of individuals with primary progressive aphasia (PPA). We examine several different types of features, including part-of-speech, complexity, context-free grammar, fluency, psycholinguistic, vocabulary richness, and acoustic, and discuss the circumstances under which they can be extracted. We consider the task of training a machine learning classifier to determine whether a participant is a control, or has the fluent or nonfluent variant of PPA. We first evaluate the individual feature sets on their classification accuracy, then perform an ablation study to determine the optimal combination of feature sets. Finally, we rank the features in four practical scenarios: given audio data only, given unsegmented transcripts only, given segmented transcripts only, and given both audio and segmented transcripts. We find that psycholinguistic features are highly discriminative in most cases, and that acoustic, context-free grammar, and part-of-speech features can also be important in some circumstances.

[1]  Eric Yeh,et al.  Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer's Disease , 2010, Brain Informatics.

[2]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[3]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[4]  M. Schwartz,et al.  Semantic Factors in Verb Retrieval: An Effect of Complexity , 1998, Brain and Language.

[5]  Dolores E. López,et al.  Speech in Alzheimer's Disease: Can Temporal and Acoustic Parameters Discriminate Dementia? , 2014, Dementia and Geriatric Cognitive Disorders.

[6]  Romola S. Bucks,et al.  Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance , 2000 .

[7]  Frank Rudzicz,et al.  Using text and acoustic features to diagnose progressive aphasia and its subtypes , 2013, INTERSPEECH.

[8]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[9]  H. Stadthagen-González,et al.  The Bristol norms for age of acquisition, imageability, and familiarity , 2006, Behavior research methods.

[10]  Kathleen C. Fraser,et al.  Automated classification of primary progressive aphasia subtypes from narrative speech transcripts , 2014, Cortex.

[11]  H. H. Clark,et al.  Using uh and um in spontaneous speaking , 2002, Cognition.

[12]  Max A. Little,et al.  Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[13]  Mark Dras,et al.  Parser Features for Sentence Grammaticality Classification , 2010, ALTA.

[14]  Michael Elhadad Book Review: Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper , 2010, CL.

[15]  S. Kemper,et al.  Longitudinal change in language production: effects of aging and dementia on grammatical complexity and propositional content. , 2001, Psychology and aging.

[16]  Serguei V. S. Pakhomov,et al.  Computerized Analysis of Speech and Language to Identify Psycholinguistic Correlates of Frontotemporal Lobar Degeneration , 2010, Cognitive and behavioral neurology : official journal of the Society for Behavioral and Cognitive Neurology.

[17]  Dimitra Vergyri,et al.  Learning diagnostic models using speech and language measures , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Victor H. Yngve,et al.  A model and an hypothesis for language structure , 1960 .

[20]  Brian Roark,et al.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  B. Miller,et al.  Classification of primary progressive aphasia and its variants , 2011, Neurology.

[22]  Graeme Hirst,et al.  Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists , 2011, Lit. Linguistic Comput..

[23]  Ani Nenkova,et al.  Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Translation and Human-Written Text , 2009, EACL.

[24]  Heather Harris Wright,et al.  Lexical diversity for adults with and without aphasia across discourse elicitation tasks , 2011, Aphasiology.

[25]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[26]  R. Logie,et al.  Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words , 1980 .

[27]  N. Cercone,et al.  Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[28]  Myrna F. Schwartz,et al.  The quantitative analysis of agrammatic production: Procedure and data , 1989, Brain and Language.

[29]  Serguei V. S. Pakhomov,et al.  A computerized technique to assess language use patterns in patients with frontotemporal dementia , 2010, Journal of Neurolinguistics.

[30]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[31]  Max A. Little,et al.  Novel Speech Signal Processing Algorithms for High-Accuracy Classification of Parkinson's Disease , 2012, IEEE Transactions on Biomedical Engineering.

[32]  Sameer Singh,et al.  Evaluation of an objective technique for analysing temporal variables in DAT spontaneous speech , 2001 .

[33]  B. Crosson,et al.  Perceptual cues used by listeners to discriminate fluent from nonfluent narrative discourse , 2011, Aphasiology.

[34]  Michael A. Covington,et al.  Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR) , 2010, J. Quant. Linguistics.