Optimal Transport-based Adaptation in Dysarthric Speech Tasks

In many real-world applications, the mismatch between distributions of training data (source) and test data (target) significantly degrades the performance of machine learning algorithms. In speech data, causes of this mismatch include different acoustic environments or speaker characteristics. In this paper, we address this issue in the challenging context of dysarthric speech, by multi-source domain/speaker adaptation (MSDA/MSSA). Specifically, we propose the use of an optimaltransport based approach, called MSDA via Weighted Joint Optimal Transport (MSDA-WDJOT). We confront the mismatch problem in dysarthria detection for which the proposed approach outperforms both the Baseline and the state-of-the-art MSDA models, improving the detection accuracy of 0.9 % over the best competitor method. We then employ MSDA-WJDOT for dysarthric speaker adaptation in command speech recognition. This provides a Command Error Rate relative reduction of 16% and 7% over the baseline and the best competitor model, respectively. Interestingly, MSDA-WJDOT provides a similarity score between the source and the target, i.e. between speakers in this case. We leverage this similarity measure to define a Dysarthric and Healthy score of the target speaker and diagnose the dysarthria with an accuracy of 95%.

[1]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[2]  Elmar Nöth,et al.  Characterisation of voice quality of Parkinson's disease using differential phonological posterior features , 2017, Comput. Speech Lang..

[3]  Stephen J. Cox,et al.  Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers , 2009, EURASIP J. Adv. Signal Process..

[4]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[5]  K. Hux,et al.  Speech recognition training for enhancing written language generation by a traumatic brain injury survivor. , 2000, Brain injury.

[6]  Massimiliano Pontil,et al.  Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport , 2020, ArXiv.

[7]  Yifan Gong,et al.  Low-rank plus diagonal adaptation for deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Yifan Gong,et al.  Adversarial Speaker Adaptation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Elmar Nöth,et al.  Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease , 2017, INTERSPEECH.

[10]  Yossi Matias,et al.  Personalizing ASR for Dysarthric and Accented Speech with Limited Data , 2019, INTERSPEECH.

[11]  Horacio Franco,et al.  Articulatory Features for ASR of Pathological Speech , 2018, INTERSPEECH.

[12]  Yanning Zhang,et al.  An unsupervised deep domain adaptation approach for robust speech recognition , 2017, Neurocomputing.

[13]  L. Kantorovich On the Translocation of Masses , 2006 .

[14]  I-Fan Chen,et al.  Maximum a posteriori adaptation of network parameters in deep models , 2015, INTERSPEECH.

[15]  Wouter M. Kouw,et al.  A review of single-source unsupervised domain adaptation , 2019, ArXiv.

[16]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[17]  Nancy Thomas-Stonell,et al.  Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment , 1997 .

[18]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[19]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[20]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Guozhen An,et al.  Automatic recognition of unified parkinson's disease rating from speech with acoustic, i-vector and phonotactic features , 2015, INTERSPEECH.

[22]  Phil D. Green,et al.  Automatic speech recognition with sparse training data for dysarthric speakers , 2003, INTERSPEECH.

[23]  Yves Normandin,et al.  Noise adaptation algorithms for robust speech recognition , 1993, Speech Commun..

[24]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[25]  Satrajit S. Ghosh,et al.  Segment-dependent dynamics in predicting parkinson's disease , 2015, INTERSPEECH.

[26]  Visar Berisha,et al.  Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks , 2017, INTERSPEECH.

[27]  Pietro Laface,et al.  Linear hidden transformations for adaptation of hybrid ANN/HMM models , 2007, Speech Commun..

[28]  Chng Eng Siong,et al.  Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers , 2014, PloS one.

[29]  Sheri Hunnicutt,et al.  An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[30]  Nicolas Courty,et al.  Joint distribution optimal transportation for domain adaptation , 2017, NIPS.

[31]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .