Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition

There has been much recent interest in building continuous speech recognition systems for people with severe speech impairments, e.g., dysarthria. However, the datasets that are commonly used are typically designed for tasks other than ASR development, or they contain only isolated words. As such, they contain much overlap in the prompts read by the speakers. Previous ASR evaluations have often neglected this, using language models (LMs) trained on non-disjoint training and test data, potentially producing unrealistically optimistic results. In this paper, we investigate the impact of LM design using the widely used TORGO database. We combine state-of-the-art acoustic models with LMs trained with data originating from LibriSpeech. Using LMs with varying vocabulary size, we examine the trade-off between the out-of-vocabulary rate and recognition confusions for speakers with varying degrees of dysarthria. It is found that the optimal LM complexity is highly speaker dependent, highlighting the need to design speaker-dependent LMs alongside speaker-dependent acoustic models when considering atypical speech.

[1]  Russell J. Love,et al.  8 – Clinical Speech Syndromes of the Motor Systems , 1992 .

[2]  Chng Eng Siong,et al.  Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers , 2014, PloS one.

[3]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[5]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[7]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[9]  Frank Rudzicz,et al.  Adapting acoustic and lexical models to dysarthric speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  José A. R. Fonollosa,et al.  Automatic Speech Recognition with Deep Neural Networks for Impaired Speech , 2016, IberSPEECH.

[11]  Heidi Christensen,et al.  A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus , 2016, LREC.

[12]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Jun'ichi Tsujii Computational Linguistics and Natural Language Processing , 2011, CICLing.

[15]  Heidi Christensen,et al.  Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech , 2013, INTERSPEECH.

[16]  Neethu Mariam Joy,et al.  Improving Acoustic Models in TORGO Dysarthric Speech Database , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[17]  Myung Jong Kim,et al.  Dysarthric Speech Recognition Using Convolutional LSTM Neural Network , 2018, INTERSPEECH.

[18]  Phil D. Green,et al.  Phase-Based Feature Representations for Improving Recognition of Dysarthric Speech , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[19]  F Rudzicz,et al.  Articulatory Knowledge in the Recognition of Dysarthric Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.