论文信息 - Automated Dysarthria Severity Classification Using Deep Learning Frameworks

Automated Dysarthria Severity Classification Using Deep Learning Frameworks

Dysarthria is a neuro-motor speech disorder that renders speech unintelligible, in proportional to its severity. Assessing the severity level of dysarthria, apart from being a diagnostic step to evaluate the patient's improvement, is also capable of aiding automatic dysarthric speech recognition systems. In this paper, a detailed study on dysarthia severity classification using various deep learning architectural choices, namely deep neural network (DNN), convolutional neural network (CNN) and long short-term memory network (LSTM) is carried out. Mel frequency cepstral coefficients (MFCCs) and its derivatives are used as features. Performance of these models are compared with a baseline support vector machine (SVM) classifier using the UA-Speech corpus and the TORGO database. The highest classification accuracy of 96.18% and 93.24% are reported for TORGO and UA-Speech respectively. Detailed analysis on performance of these models shows that a proper choice of a deep learning architecture can ensure better performance than the conventionally used SVM classifier.

Rajeev Rajan | Amlu Anna Joshy | R. Rajan

[1] Heidi Christensen,et al. Intelligibility Assessment and Speech Recognizer Word Accuracy Rate Prediction for Dysarthric Speakers in a Factor Analysis Subspace , 2015, ACM Trans. Access. Comput..

[2] José A. R. Fonollosa,et al. Automatic Speech Recognition with Deep Neural Networks for Impaired Speech , 2016, IberSPEECH.

[3] Visar Berisha,et al. Towards a clinical tool for automatic intelligibility assessment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Tiago H. Falk,et al. Spectral Features for Automatic Blind Intelligibility Estimation of Spastic Dysarthric Speech , 2011, INTERSPEECH.

[5] F Rudzicz,et al. Articulatory Knowledge in the Recognition of Dysarthric Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Pedro Gómez Vilda,et al. Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[7] Mohammad Ali Keyvanrad,et al. Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks , 2018, ETRI Journal.

[8] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[9] Paavo Alku,et al. Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences , 2018, INTERSPEECH.

[10] Frank Rudzicz,et al. The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[11] Ke Chen,et al. Exploring hierarchical speech representations with a deep convolutional neural network , 2011 .

[12] Anil Kumar Vuppala,et al. Perceptually Enhanced Single Frequency Filtering for Dysarthric Speech Detection and Intelligibility Assessment , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Thomas S. Huang,et al. Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[14] R. Palmer,et al. Methods of speech therapy treatment for stable dysarthria: A review , 2007 .

[15] Sunil Kumar Kopparapu,et al. Automatic assessment of dysarthria severity level using audio descriptors , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Myung Jong Kim,et al. Dysarthric Speech Recognition Using Convolutional LSTM Neural Network , 2018, INTERSPEECH.

[17] Myung Jong Kim,et al. Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models , 2013, INTERSPEECH.

[18] Shrikanth S. Narayanan,et al. An Overview on Perceptually Motivated Audio Indexing and Classification , 2013, Proceedings of the IEEE.

[19] Elmar Nöth,et al. PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..