Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domain, due to high costs of annotating sizeable datasets. This paper makes a first attempt to formulate cross-domain DSD as an unsupervised domain adaptation (UDA) problem. We use labelled source-domain data and unlabelled target-domain data, and propose a multi-task learning strategy, including dysarthria presence classification (DPC), domain adversarial training (DAT) and mutual information minimization (MIM), which aim to learn dysarthriadiscriminative and domain-invariant biomarker embeddings. Specifically, DPC helps biomarker embeddings capture critical indicators of dysarthria; DAT forces biomarker embeddings to be indistinguishable in source and target domains; and MIM further reduces the correlation between biomarker embeddings and domain-related cues. By treating the UASPEECH and TORGO corpora respectively as the source and target domains, experiments show that the incorporation of UDA attains absolute increases of 22.2% and 20.0% respectively in utterancelevel weighted average recall and speaker-level accuracy.

[1]  P. Enderby,et al.  Frenchay Dysarthria Assessment , 1983 .

[2]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  M. Lindstrom,et al.  Articulatory movements during vowels in speakers with dysarthria and healthy controls. , 2008, Journal of speech, language, and hearing research : JSLHR.

[4]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[5]  Bart Preneel,et al.  Mutual Information Analysis , 2008, CHES.

[6]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[7]  Tiago H. Falk,et al.  Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech , 2012, INTERSPEECH.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  J R Orozco-Arroyave,et al.  Automatic detection of Parkinson's disease in running speech spoken in three different languages. , 2016, The Journal of the Acoustical Society of America.

[13]  Yanning Zhang,et al.  An unsupervised deep domain adaptation approach for robust speech recognition , 2017, Neurocomputing.

[14]  Elliot Moore,et al.  Cross-Database Models for the Classification of Dysarthria Presence , 2017, INTERSPEECH.

[15]  Carla Agurto,et al.  Detection of Amyotrophic Lateral Sclerosis (ALS) via Acoustic Analysis , 2018, bioRxiv.

[16]  Carlos Busso,et al.  Domain Adversarial for Acoustic Emotion Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Chia-Ping Chen,et al.  Effective Attention Mechanism in Dynamic Models for Speech Emotion Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Haizhou Li,et al.  Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Paavo Alku,et al.  Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences , 2018, INTERSPEECH.

[20]  Juliette Millet,et al.  Learning to Detect Dysarthria from Raw Speech , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Bozena Kostek,et al.  Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech , 2019, INTERSPEECH.

[22]  Anil Kumar Vuppala,et al.  Perceptually Enhanced Single Frequency Filtering for Dysarthric Speech Detection and Intelligibility Assessment , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Razvan C. Bunescu,et al.  Diagnosing Dysarthria with Long Short-Term Memory Networks , 2019, INTERSPEECH.

[24]  Ina Kodrasi,et al.  Super-gaussianity of Speech Spectral Coefficients as a Potential Biomarker for Dysarthric Speech Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Zhe Gan,et al.  CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information , 2020, ICML.

[26]  Stavros Petridis,et al.  Domain Adversarial Neural Networks for Dysarthric Speech Recognition , 2020, INTERSPEECH.

[27]  Hexahedral Tobie,et al.  Assessment Of Intelligibility Of Dysarthric Speech Protocol , 2021 .