Dysarthric Speech Recognition Using Convolutional LSTM Neural Network

Dysarthria is a motor speech disorder that impedes the physical production of speech. Speech in patients with dysarthria is generally characterized by poor articulation, breathy voice, and monotonic intonation. Therefore, modeling the spectral and temporal characteristics of dysarthric speech is critical for better performance in dysarthric speech recognition. Convolutional long short-term memory recurrent neural networks (CLSTMRNNs) have recently successfully been used in normal speech recognition, but have rarely been used in dysarthric speech recognition. We hypothesized CLSTM-RNNs have the potential to capture the distinct characteristics of dysarthric speech, taking advantage of convolutional neural networks (CNNs) for extracting effective local features and LSTM-RNNs for modeling temporal dependencies of the features. In this paper, we investigate the use of CLSTM-RNNs for dysarthric speech recognition. Experimental evaluation on a database collected from nine dysarthric patients showed that our approach provides substantial improvement over both standard CNN and LSTM-RNN based speech recognizers.

[1]  F Rudzicz,et al.  Articulatory Knowledge in the Recognition of Dysarthric Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jun Wang,et al.  Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization , 2015, SLPAT@Interspeech.

[3]  Emre Yilmaz,et al.  Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech , 2016, INTERSPEECH.

[4]  Myungjong Kim,et al.  Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Myung Jong Kim,et al.  Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model , 2016, INTERSPEECH.

[6]  Myung Jong Kim,et al.  Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models , 2013, INTERSPEECH.

[7]  Tetsuya Takiguchi,et al.  Dysarthric speech recognition using a convolutive bottleneck network , 2014, 2014 12th International Conference on Signal Processing (ICSP).

[8]  Gary L. Pattee,et al.  Bulbar and speech motor assessment in ALS: Challenges and future directions , 2013, Amyotrophic lateral sclerosis & frontotemporal degeneration.

[9]  Horacio Franco,et al.  Time-frequency convolutional networks for robust speech recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Lei Wang,et al.  Convolutional Recurrent Neural Networks for Text Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[11]  Z. Simmons,et al.  Articulatory Kinematic Characteristics Across the Dysarthria Severity Spectrum in Individuals With Amyotrophic Lateral Sclerosis. , 2017, American journal of speech-language pathology.

[12]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Helmer Strik,et al.  Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech , 2017, INTERSPEECH.

[15]  Kyu J. Han,et al.  Deep Learning-Based Telephony Speech Recognition in the Wild , 2017, INTERSPEECH.

[16]  Younggwan Kim,et al.  Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[17]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[18]  Stephen J. Cox,et al.  Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers , 2009, EURASIP J. Adv. Signal Process..

[19]  Tara N. Sainath,et al.  Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[20]  Jun Wang,et al.  Multi-view Representation Learning via Canonical Correlation Analysis for Dysarthric Speech Recognition , 2018, Advances in Intelligent Systems and Computing.

[21]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[22]  Thomas S. Huang,et al.  Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  Sunil Kumar Kopparapu,et al.  Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition , 2017, INTERSPEECH.

[24]  Heidi Christensen,et al.  Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech , 2013, INTERSPEECH.

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  José A. R. Fonollosa,et al.  Automatic Speech Recognition with Deep Neural Networks for Impaired Speech , 2016, IberSPEECH.

[27]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  László Tóth,et al.  Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).