Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often physically incapacitated. Mel-Frequency Cepstral Coefficients (MFCCs) have been proven to be an appropriate representation of dysarthric speech, but the question of which MFCC-based feature set represents dysarthric acoustic features most effectively has not been answered. Moreover, most of the current dysarthric speech recognisers are either speaker-dependent (SD) or speaker-adaptive (SA), and they perform poorly in terms of generalisability as a speaker-independent (SI) model. First, by comparing the results of 28 dysarthric SD speech recognisers, this study identifies the best-performing set of MFCC parameters, which can represent dysarthric acoustic features to be used in Artificial Neural Network (ANN)-based ASR. Next, this paper studies the application of ANNs as a fixed-length isolated-word SI ASR for individuals who suffer from dysarthria. The results show that the speech recognisers trained by the conventional 12 coefficients MFCC features without the use of delta and acceleration features provided the best accuracy, and the proposed SI ASR recognised the speech of the unforeseen dysarthric evaluation subjects with word recognition rate of 68.38%.

[1]  Murat Hüsnü Sazli,et al.  Speech recognition with artificial neural networks , 2010, Digit. Signal Process..

[2]  Prasad D Polur,et al.  Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model. , 2005, Journal of rehabilitation research and development.

[3]  Mark Hasegawa-Johnson,et al.  Universal access: speech recognition for talkers with spastic dysarthria , 2009, INTERSPEECH.

[4]  Prasad D Polur,et al.  Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. , 2006, Medical engineering & physics.

[5]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[6]  J R Deller,et al.  On the use of hidden Markov modelling for recognition of dysarthric speech. , 1991, Computer methods and programs in biomedicine.

[7]  Mark Hasegawa-Johnson,et al.  State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition , 2010, SLPAT@NAACL.

[8]  Manuela M. Veloso,et al.  Prioritized Multihypothesis Tracking by a Robot with Limited Sensing , 2009, EURASIP J. Adv. Signal Process..

[9]  Marco Gori,et al.  A survey of hybrid ANN/HMM models for automatic speech recognition , 2001, Neurocomputing.

[10]  H. A. Leeper,et al.  Dysarthric speech: a comparison of computerized speech recognition and listener intelligibility. , 1997, Journal of rehabilitation research and development.

[11]  Raymond D. Kent Research on speech motor control and its disorders: a review and prospective. , 2000, Journal of communication disorders.

[12]  Eric Sanders,et al.  Automatic Recognition Of Dutch Dysarthric Speech, A Pilot Study , 2002 .

[13]  Sheri Hunnicutt,et al.  An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[14]  A. Mihailidis,et al.  Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review , 2010, Assistive technology : the official journal of RESNA.

[15]  Phil D. Green,et al.  Automatic speech recognition with sparse training data for dysarthric speakers , 2003, INTERSPEECH.

[16]  Frank Rudzicz,et al.  Using articulatory likelihoods in the recognition of dysarthric speech , 2012, Speech Commun..

[17]  Karen A Hux,et al.  Accuracy of three speech recognition systems: Case study of dysarthric speech , 2000 .

[18]  Stephanie A. Borrie,et al.  Perceptual learning of dysarthric speech: a review of experimental studies. , 2012, Journal of speech, language, and hearing research : JSLHR.

[19]  James Carmichael,et al.  A speech-controlled environmental control system for people with severe dysarthria. , 2007, Medical engineering & physics.

[20]  Siti Zaiton Mohd Hashim,et al.  An automated framework for software test oracle , 2011, Inf. Softw. Technol..

[21]  Stephen J. Cox,et al.  Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers , 2009, EURASIP J. Adv. Signal Process..

[22]  P.D. Polur,et al.  Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model , 2005, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[23]  G Jayaram,et al.  Experiments in dysarthric speech recognition using artificial neural networks. , 1995, Journal of rehabilitation research and development.

[24]  Thomas S. Huang,et al.  Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[26]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[27]  Kristin Rosen,et al.  Automatic speech recognition and a review of its functioning with dysarthric speech , 2000 .

[28]  John-Paul Hosom,et al.  Evaluation of a Speech Recognition Prototype for Speakers with Moderate and Severe Dysarthria: A Preliminary Report , 2010, Augmentative and alternative communication.

[29]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[30]  C. Moorehead All rights reserved , 1997 .

[31]  Linda J. Ferrier,et al.  Dysarthric speakers' intelligibility and speech characteristics in relation to computer speech recognition , 1995 .

[32]  Marek Wisniewski,et al.  Automatic Detection of Disorders in a Continuous Speech with the Hidden Markov Models Approach , 2008, Computer Recognition Systems 2.

[33]  Douglas D. O'Shaughnessy,et al.  Alternative Speech Communication System for Persons with Severe Speech Disorders , 2009, EURASIP J. Adv. Signal Process..

[34]  Seyed Reza Shahamiri,et al.  Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach , 2014, Neurocomputing.

[35]  Viveka Lyberg Åhlander,et al.  Automatic speech recognition (ASR) and its use as a tool for assessment or therapy of voice, speech, and language disorders , 2009, Logopedics, phoniatrics, vocology.