Automatic speech recognition in the diagnosis of primary progressive aphasia

Narrative speech can provide a valuable source of information about an individual’s linguistic abilities across lexical, syntactic, and pragmatic levels. However, analysis of narrative speech is typically done by hand, and is therefore extremely time-consuming. Use of automatic speech recognition (ASR) software could make this type of analysis more efficient and widely available. In this paper, we present the results of an initial attempt to use ASR technology to generate transcripts of spoken narratives from participants with semantic dementia (SD), progressive nonfluent aphasia (PNFA), and healthy controls. We extract text features from the transcripts and use these features, alone and in combination with acoustic features from the speech signals, to classify transcripts as patient versus control, and SD versus PNFA. Additionally, we generate artificially noisy transcripts by applying insertions, substitutions, and deletions to manually-transcribed data, allowing experiments to be conducted across a wider range of noise levels than are produced by a tuned ASR system. We find that reasonably good classification accuracies can be achieved by selecting appropriate features from the noisy transcripts. We also find that the choice of using ASR data or manually transcribed data as the training set can have a strong effect on the accuracy of the classifiers.

[1]  Frank Rudzicz,et al.  Using text and acoustic features to diagnose progressive aphasia and its subtypes , 2013, INTERSPEECH.

[2]  A. Mihailidis,et al.  Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review , 2010, Assistive technology : the official journal of RESNA.

[3]  Matthew A. Lambon Ralph,et al.  The Rise and Fall of Frequency and Imageability: Noun and Verb Production in Semantic Dementia , 2000, Brain and Language.

[4]  Fernando Cuetos,et al.  Different variables predict anomia in different subjects: A longitudinal study of two Alzheimer's patients , 2008, Neuropsychologia.

[5]  Brian Roark,et al.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  H. Stadthagen-González,et al.  The Bristol norms for age of acquisition, imageability, and familiarity , 2006, Behavior research methods.

[7]  Max A. Little,et al.  Nonlinear, Biophysically-Informed Speech Pathology Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Kathleen C. Fraser,et al.  Automated classification of primary progressive aphasia subtypes from narrative speech transcripts , 2014, Cortex.

[9]  B. Miller,et al.  Classification of primary progressive aphasia and its variants , 2011, Neurology.

[10]  Matthew A. Lambon Ralph,et al.  Naming in semantic dementia—what matters? , 1998, Neuropsychologia.

[11]  Marc Brys,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009 .

[12]  Steve J. Young,et al.  Error simulation for training statistical dialogue systems , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[13]  Luís C. Oliveira,et al.  Jitter Estimation Algorithms for Detection of Pathological Voices , 2009, EURASIP J. Adv. Signal Process..

[14]  H. H. Clark,et al.  Using uh and um in spontaneous speaking , 2002, Cognition.

[15]  Jan Kleindienst,et al.  Impact of word error rate on driving performance while dictating short texts , 2012, AutomotiveUI.

[16]  Myrna F. Schwartz,et al.  The quantitative analysis of agrammatic production: Procedure and data , 1989, Brain and Language.

[17]  Gökhan Tür,et al.  Speech-based automated cognitive status assessment , 2010, INTERSPEECH.

[18]  Brian Avants,et al.  Non-fluent speech in frontotemporal lobar degeneration , 2009, Journal of Neurolinguistics.

[19]  Lotte Meteyard,et al.  The relation between content and structure in language production: An analysis of speech errors in semantic dementia , 2009, Brain and Language.

[20]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[21]  Serguei V. S. Pakhomov,et al.  Computerized Analysis of Speech and Language to Identify Psycholinguistic Correlates of Frontotemporal Lobar Degeneration , 2010, Cognitive and behavioral neurology : official journal of the Society for Behavioral and Cognitive Neurology.

[22]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Frank Rudzicz,et al.  Using acoustic measures to predict automatic speech recognition performance for dysarthric speakers , 2011, MAVEBA.

[24]  Roy W Jones,et al.  Be concrete to be comprehended: Consistent imageability effects in semantic dementia for nouns, verbs, synonyms and associates , 2013, Cortex.

[25]  Davide Crepaldi,et al.  On nouns, verbs, lexemes, and lemmas: Evidence from the spontaneous speech of seven aphasic patients , 2011 .

[26]  M. Grossman,et al.  Primary progressive aphasia: clinicopathological correlations , 2010, Nature Reviews Neurology.

[27]  Maria Luisa Gorno-Tempini,et al.  Connected speech production in three variants of primary progressive aphasia. , 2010, Brain : a journal of neurology.

[28]  Dimitra Vergyri,et al.  Learning diagnostic models using speech and language measures , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[29]  R. Logie,et al.  Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words , 1980 .

[30]  Steve Renals,et al.  Longitudinal study of ASR performance on ageing voices , 2008, INTERSPEECH.

[31]  Raymond D. Kent,et al.  Toward an acoustic typology of motor speech disorders , 2003, Clinical linguistics & phonetics.