Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Dysarthria refers to a speech disorder caused by trauma to the brain areas concerned with motor aspects of speech giving rise to effortful, slow, slurred or prosodically abnormal speech. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks, owing mostly to insufficient dysarthric speech data. Speaker related challenges complicates data collection process for dysarthric speech. In this paper, we explore data augmentation using temporal and speed modifications to healthy speech to simulate dysarthric speech. DNN-HMM based Automatic Speech Recognition (ASR) and Random Forest based classification were used for evaluation of the proposed method. Dysarthric speech, generated synthetically, is classified for severity level using a Random Forest classifier that is trained on actual dysarthric speech. ASR trained on healthy speech, augmented with simulated dysarthric speech is evaluated for dysarthric speech recognition. All evaluations were carried out using Universal Access dysarthric speech corpus. An absolute improvement of 4.24% and 2% WAS achieved using tempo based and speed based data augmentation respectively as compared to ASR performance using healthy speech alone for training.

[1]  Helmer Strik,et al.  Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech , 2017, INTERSPEECH.

[2]  Lise Crevier-Buchman,et al.  The DesPho-APaDy Project: Developing an Acoustic-phonetic Characterization of Dysarthric Speech in French , 2010, LREC.

[3]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[4]  Emre Yilmaz,et al.  A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research , 2016, LREC.

[5]  Gina-Anne Levow,et al.  Development of a Cantonese dysarthric speech corpus , 2015, INTERSPEECH.

[6]  Sunil Kumar Kopparapu,et al.  Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition , 2017, INTERSPEECH.

[7]  Frank RudziczAravind The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2012 .

[8]  Sunil Kumar Kopparapu,et al.  Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation , 2016, SPECOM.

[9]  Frank Rudzicz Adjusting dysarthric speech signals to be more intelligible , 2013, Comput. Speech Lang..

[10]  Sree Hari Krishnan Parthasarathi,et al.  fMLLR based feature-space speaker adaptation of DNN acoustic models , 2015, INTERSPEECH.

[11]  E. Růžička,et al.  Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson's disease. , 2011, The Journal of the Acoustical Society of America.

[12]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[13]  Heidi Christensen,et al.  A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus , 2016, LREC.

[14]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[15]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Yong-Ju Lee,et al.  Design and creation of Dysarthric Speech Database for development of QoLT software technology , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[17]  Frank Rudzicz,et al.  On the importance of normative data in speech-based assessment , 2017, ArXiv.

[18]  Heidi Christensen,et al.  Automatic selection of speakers for improved acoustic modelling: recognition of disordered speech with sparse data , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[19]  Frank Rudzicz,et al.  Learning mixed acoustic/articulatory models for disabled speech , 2010 .

[20]  Tetsuya Takiguchi,et al.  Dysarthric speech recognition using a convolutive bottleneck network , 2014, 2014 12th International Conference on Signal Processing (ICSP).

[21]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[22]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[24]  S. Skodda,et al.  Intonation and speech rate in Parkinson's disease: general and dynamic aspects and responsiveness to levodopa admission. , 2011, Journal of voice : official journal of the Voice Foundation.

[25]  Jesús Francisco Vargas-Bonilla,et al.  New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease , 2014, LREC.

[26]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.