Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.

Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  Prasad D Polur,et al.  Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model. , 2005, Journal of rehabilitation research and development.

[3]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[4]  G Jayaram,et al.  Experiments in dysarthric speech recognition using artificial neural networks. , 1995, Journal of rehabilitation research and development.

[5]  R. Patel Identifying information-bearing prosodic parameters in severely dysarthric vocalizations , 2000 .

[6]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  D. M. Horowitz,et al.  A statistical causal model for the assessment of dysarthric speech and the utility of computer-based speech recognition , 1993, IEEE Transactions on Biomedical Engineering.

[8]  J R Deller,et al.  On the use of hidden Markov modelling for recognition of dysarthric speech. , 1991, Computer methods and programs in biomedicine.

[9]  Ronald A. Cole,et al.  Automatic time alignment of phonemes using acoustic-phonetic information , 2000 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[12]  José M. F. Moura,et al.  The Viterbi algorithm and Markov noise memory , 2000, IEEE Trans. Inf. Theory.

[13]  Yik-Cheung Tam,et al.  Discriminative auditory-based features for robust speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[14]  Jan Noyes,et al.  Speech recognition technology for individuals with disabilities , 1992 .