Automatic Assessment of Speech Intelligibility for Individuals With Aphasia

Traditional in-person therapy may be difficult to access for individuals with aphasia due to the shortage of speech-language pathologists and high treatment cost. Computerized exercises offer a promising low-cost and constantly accessible supplement to in-person therapy. Unfortunately, the lack of feedback for verbal expression in existing programs hinders the applicability and effectiveness of this form of treatment. A prerequisite for producing meaningful feedback is speech intelligibility assessment. In this work, we investigate the feasibility of an automated system to assess three aspects of aphasic speech intelligibility: clarity, fluidity, and prosody. We introduce our aphasic speech corpus, which contains speech-based interaction between individuals with aphasia and a tablet-based application designed for therapeutic purposes. We present our method for eliciting reliable ground-truth labels for speech intelligibility based on the perceptual judgment of nonexpert human evaluators. We describe and analyze our feature set engineered for capturing pronunciation, rhythm, and intonation. We investigate the classification performance of our system under two conditions, one using human-labeled transcripts to drive feature extraction, and another using transcripts generated automatically. We show that some aspects of aphasic speech intelligibility can be estimated at human-level performance. Our results demonstrate the potential for the computerized treatment of aphasia and lay the groundwork for bridging the gap between human and automatic intelligibility assessment.

[1]  Emily Mower Provost,et al.  Automatic analysis of speech quality for aphasia treatment , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Heidi Christensen,et al.  A comparative study of adaptive, automatic recognition of disordered speech , 2012, INTERSPEECH.

[4]  Brian Roark,et al.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Frank Rudzicz,et al.  Treatment intensity and childhood apraxia of speech. , 2015, International journal of language & communication disorders.

[6]  Peter F. Halpin,et al.  Online crowdsourcing for efficient rating of speech: a validation study. , 2015, Journal of communication disorders.

[7]  L. Manheim,et al.  Patient-reported changes in communication after computer-based script training for aphasia. , 2009, Archives of physical medicine and rehabilitation.

[8]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[9]  Nick Miller,et al.  Association between objective measurement of the speech intelligibility of young people with dysarthria and listener ratings of ease of understanding , 2014, International journal of speech-language pathology.

[10]  M P Black,et al.  Automatic Prediction of Children's Reading Ability for High-Level Literacy Assessment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Heidi Christensen,et al.  Automatic selection of speakers for improved acoustic modelling: recognition of disordered speech with sparse data , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[12]  K. Hacioglu,et al.  TESTING SUPRASEGMENTAL ENGLISH THROUGH PARROTING , 2010 .

[13]  Donald A. Robin,et al.  Treatment guidelines for acquired apraxia of speech , 2006 .

[14]  Leora R Cherney,et al.  Computerized script training for aphasia: preliminary results. , 2008, American journal of speech-language pathology.

[15]  Frank Rudzicz,et al.  Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech , 2011, Canadian Conference on AI.

[16]  Nicolas Côté Integral and Diagnostic Intrusive Prediction of Speech Quality , 2011, T-Labs Series in Telecommunication Services.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[19]  Robert Teasell,et al.  Therapeutic Interventions for Aphasia Initiated More than Six Months Post Stroke: A Review of the Evidence , 2012, Topics in stroke rehabilitation.

[20]  Richard C Katz Computers in the treatment of chronic aphasia. , 2010, Seminars in speech and language.

[21]  Isabel Trancoso,et al.  Automatic word naming recognition for an on-line aphasia treatment system , 2013, Comput. Speech Lang..

[22]  Edward Gibson,et al.  Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch) , 2012 .

[23]  Alexandre Allauzen,et al.  Using Dynamic Time Warping to Compute Prosodic Similarity Measures , 2011, INTERSPEECH.

[24]  R H Brookshire,et al.  Presence, completeness, and accuracy of main concepts in the connected speech of non-brain-damaged adults and adults with aphasia. , 1995, Journal of speech and hearing research.

[25]  W. Ziegler,et al.  Telediagnostic assessment of intelligibility in dysarthria: a pilot investigation of MVP-online. , 2008, Journal of communication disorders.

[26]  R. R. Robey A meta-analysis of clinical outcomes in the treatment of aphasia. , 1998, Journal of speech, language, and hearing research : JSLHR.

[27]  R. Teasell,et al.  Intensity of Aphasia Therapy, Impact on Recovery , 2003, Stroke.

[28]  Serguei V. S. Pakhomov,et al.  Computerized Analysis of Speech and Language to Identify Psycholinguistic Correlates of Frontotemporal Lobar Degeneration , 2010, Cognitive and behavioral neurology : official journal of the Society for Behavioral and Cognitive Neurology.

[29]  Shrikanth S. Narayanan,et al.  Continuous speech recognition using attention shift decoding with soft decision , 2009, INTERSPEECH.

[30]  Katarina L Haley,et al.  Toward a quantitative basis for assessment and diagnosis of apraxia of speech. , 2012, Journal of speech, language, and hearing research : JSLHR.

[31]  Heidi Christensen,et al.  Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech , 2013, INTERSPEECH.

[32]  Rosalind C. Kaye,et al.  Computer-based script training for aphasia: emerging themes from post-treatment interviews. , 2011, Journal of communication disorders.

[33]  Jeff A. Bilmes,et al.  Attention shift decoding for conversational speech recognition , 2007, INTERSPEECH.

[34]  Siti Salwah Salim,et al.  Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker , 2015, Expert Syst. Appl..

[35]  Nick Miller,et al.  Prevalence and pattern of perceived intelligibility changes in Parkinson’s disease , 2007, Journal of Neurology, Neurosurgery, and Psychiatry.

[36]  Jack Gandour,et al.  Dysprosody in Broca's aphasia: A case study , 1989, Brain and Language.

[37]  Heidi Christensen,et al.  Learning speaker-specific pronunciations of disordered speech , 2013, INTERSPEECH.

[38]  S. Blumstein,et al.  Production deficits in aphasia: A voice-onset time analysis , 1980, Brain and Language.

[39]  Visar Berisha,et al.  Modeling pathological speech perception from data with similarity labels , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Gary Weismer,et al.  Direct magnitude estimates of speech intelligibility in dysarthria: effects of a chosen standard. , 2002, Journal of speech, language, and hearing research : JSLHR.

[41]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[42]  Julie L. Wambaugh,et al.  A Critical Review of Acoustic Analyses of Aphasic and/or Apraxic Speech , 1996 .

[43]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[44]  Dimitra Vergyri,et al.  Learning diagnostic models using speech and language measures , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[45]  Naveen Kumar,et al.  Automatic intelligibility classification of sentence-level pathological speech , 2015, Comput. Speech Lang..

[46]  Swathi Kiran,et al.  Effect of Verb Network Strengthening Treatment (VNeST) on lexical retrieval of content words in sentences in persons with aphasia , 2009, Aphasiology.

[47]  Martha Danly,et al.  Speech prosody in Broca's aphasia , 1982, Brain and Language.

[48]  Katarina L. Haley,et al.  Temporal and spectral properties of voiceless fricatives in aphasia and apraxia of speech , 2002 .

[49]  Elmar Nöth,et al.  Automatic scoring of the intelligibility in patients with cancer of the oral cavity , 2007, INTERSPEECH.

[50]  Katharine H. Odell,et al.  Perceptual characteristics of consonant production by apraxic speakers. , 1990, The Journal of speech and hearing disorders.

[51]  Anna Basso,et al.  Aphasia and its therapy , 2003 .

[52]  A. Kertesz,et al.  The Aphasia Quotient: The Taxonomic Approach to Measurement of Aphasic Disability , 2004, Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques.

[53]  J. Martens,et al.  Speech technology-based assessment of phoneme intelligibility in dysarthria. , 2009, International journal of language & communication disorders.

[54]  Serena Amici,et al.  Neurocase: the Neural Basis of Cognition Apraxia of Speech: an Overview Apraxia of Speech: an Overview Apraxia Overview , 2022 .

[55]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[56]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Frank Rudzicz,et al.  Using text and acoustic features to diagnose progressive aphasia and its subtypes , 2013, INTERSPEECH.

[58]  Jean-Pierre Martens,et al.  Automated Intelligibility Assessment of Pathological Speech Using Phonological Features , 2009, EURASIP J. Adv. Signal Process..

[59]  M. Albert,et al.  Manual of Aphasia and Aphasia Therapy , 2013 .

[60]  Elmar Nöth,et al.  Towards robust automatic evaluation of pathologic telephone speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[61]  Walter Huber,et al.  Supervised home training of dialogue skills in chronic aphasia: a randomized parallel group study. , 2011, Journal of speech, language, and hearing research : JSLHR.

[62]  N. Miller Measuring up to speech intelligibility. , 2013, International journal of language & communication disorders.

[63]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[64]  E. Grabe,et al.  Durational variability in speech and the rhythm class hypothesis , 2005 .

[65]  Emily Mower Provost,et al.  Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation , 2014, INTERSPEECH.