Automatic Speech Intelligibility Detection for Speakers with Speech Impairments: The Identification of Significant Speech Features

Selection of relevant features is important for discriminating speech in detection based ASR system, thus contributing to the improved performance of the detector. In the context of speech impairments, speech errors can be discriminated from regular speech by adopting the appropriate discriminative speech features with high discriminative ability between the impaired and the control group. However, identification of suitable discriminative speech features for error detection in impaired speech was not well investigated in the literature. Characteristics of impaired speech are grossly different from regular speech, thus making the existing speech features to be less effective in recognizing the impaired speech. To overcome this gap, the speech features of impaired speech based on the prosody, pronunciation and voice quality are analyzed for identifying the significant speech features which are related to the intelligibility deficits. In this research, we investigate the relations of speech impairments due to cerebral palsy, and hearing impairment with the prosody, pronunciation, and voice quality. Later, we identify the relationship of the speech features with the speech intelligibility classification and the significant speech features in improving the discriminative ability of an automatic speech intelligibility detection system. The findings showed that prosody, pronunciation and voice quality features are statistically significant speech features for improving the detection ability of impaired speeches. Voice quality is identified as the best speech features with more discriminative power in detecting speech intelligibility of impaired speech.

[1]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[2]  Steve Renals,et al.  Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance , 2010, EURASIP J. Audio Speech Music. Process..

[3]  M. Dougherty,et al.  Classification of speech intelligibility in Parkinson's disease , 2014 .

[4]  Magnus Rosell An Introduction to Front-End Processing and Acoustic Features for Automatic Speech Recognition , 2006 .

[5]  Mireia Farrús,et al.  Jitter and shimmer measurements for speaker recognition , 2007, INTERSPEECH.

[6]  C Bhushan,et al.  Speech Recognition Using Artificial Neural Network – A Review , 2016 .

[7]  Tien Ping Tan,et al.  A Malay Dialect Translation and Synthesis System: Proposal and Preliminary System , 2012, 2012 International Conference on Asian Language Processing.

[8]  Sazali Yaacob,et al.  Comparison of speech parameterization techniques for the classification of speech disfluencies , 2013 .

[9]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[10]  Hua Nong Ting,et al.  Effects of Speech Phonological Features during Passive Perception on Cortical Auditory Evoked Potential in Sensorineural Hearing Loss , 2017 .

[11]  Jacqueline Ann Bauman-Wängler Articulatory and Phonological Impairments: A Clinical Focus , 1999 .

[12]  Oxana Lapteva Speaker Perception and Recognition. An integrative framework for computational speech processing , 2011 .

[13]  Alfonso M. Canterla,et al.  Design of Detectors for Automatic Speech Recognition , 2012 .

[14]  Francis Nolan,et al.  The ?telephone effect? on formants: a response , 2002 .

[15]  Björn Schuller,et al.  Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments , 2017 .

[16]  A. Mihailidis,et al.  Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review , 2010, Assistive technology : the official journal of RESNA.

[17]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[18]  Bronagh Blaney, John Wilson Acoustic variability in dysarthria and computer speech recognition , 2000 .

[19]  Naveen Kumar,et al.  Automatic intelligibility classification of sentence-level pathological speech , 2015, Comput. Speech Lang..

[20]  H. Wertzner,et al.  Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders , 2005, Brazilian journal of otorhinolaryngology.

[21]  Raymond D. Kent,et al.  Toward phonetic intelligibility testing in dysarthria. , 1989, The Journal of speech and hearing disorders.

[22]  Yousif A. El-Imam,et al.  Rules and Algorithms for Phonetic Transcription of Standard Malay , 2005, IEICE Trans. Inf. Syst..

[23]  Roger C. Green,et al.  Linguistic Subgrouping Within Polynesia: the Implications for Prehistoric Settlement , 1966 .

[24]  平野 実,et al.  Understanding Voice Problems: A Physiological Perspective for Diagnosis and Treatment , 1996 .

[25]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[26]  Fraser Shein,et al.  Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility , 2012, Speech Commun..

[27]  Peter Howell,et al.  Signals and Systems for Speech and Hearing , 1991 .

[28]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[29]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .