Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data

In this work, we focus on estimating articulatory movements from acoustic features, known as acoustic-to-articulatory inversion (AAI), for dysarthric patients with amyotrophic lateral sclerosis (ALS). Unlike healthy subjects, there are two potential challenges involved in AAI on dysarthric speech. Due to speech impairment, the pronunciation of dysarthric patients is unclear and inaccurate, which could impact the AAI performance. In addition, acoustic-articulatory data from dysarthric patients is limited due to the difficulty in the recording. These challenges motivate us to utilize cross-corpus acoustic-articulatory data. In this study, we propose an AAI model by conditioning speaker information using x-vectors at the input, and multi-target articulatory trajectory outputs for each corpus separately. Results reveal that the proposed AAI model shows relative improvements of the Pearson correlation coefficient (CC) by ~13.16% and ~16.45% over a randomly initialized baseline AAI model trained with only dysarthric corpus in seen and unseen conditions, respectively. In the seen conditions, the proposed AAI model outperforms the three baseline AAI models, that utilize the cross-corpus, by ~3.49%, ~6.46%, and ~4.03% in terms of CC.

[1]  Jordan R. Green,et al.  Speaking rate effects on articulatory pattern consistency in talkers with mild ALS , 2014, Clinical linguistics & phonetics.

[2]  Prasanta Kumar Ghosh,et al.  Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors , 2020, INTERSPEECH.

[3]  Carol Y. Espy-Wilson,et al.  Multi-Corpus Acoustic-to-Articulatory Speech Inversion , 2019, INTERSPEECH.

[4]  Christian Kroos Using sensor orientation information for computational head stabilisation in 3d electromagnetic articulography (EMA) , 2009, INTERSPEECH.

[5]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[6]  Peng Liu,et al.  A deep recurrent approach for acoustic-to-articulatory inversion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[8]  S. Langmore,et al.  Physiologic deficits in the orofacial system underlying dysarthria in amyotrophic lateral sclerosis. , 1994, Journal of speech and hearing research.

[9]  Prasanta Kumar Ghosh,et al.  Optimal sensor placement in electromagnetic articulography recording for speech production study , 2018, Comput. Speech Lang..

[10]  Raymond D. Kent,et al.  Speech deterioration in amyotrophic lateral sclerosis: a case study. , 1991, Journal of speech and hearing research.

[11]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[12]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  A. Ludolph,et al.  Amyotrophic lateral sclerosis. , 2012, Current opinion in neurology.

[14]  Prasanta Kumar Ghosh,et al.  Closed-set speaker conditioned acoustic-to-articulatory inversion using bi-directional long short term memory network. , 2020, The Journal of the Acoustical Society of America.

[15]  Korin Richmond,et al.  Estimating articulatory parameters from the acoustic speech signal , 2002 .

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Prasanta Kumar Ghosh,et al.  Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory , 2018, INTERSPEECH.

[18]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[19]  Prasanta Kumar Ghosh,et al.  Representation Learning Using Convolution Neural Network for Acoustic-to-articulatory Inversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Kai Zhao,et al.  Acoustic to articulatory mapping with deep neural network , 2014, Multimedia Tools and Applications.

[21]  Prasanta Kumar Ghosh,et al.  Comparison of Speech Tasks for Automatic Classification of Patients with Amyotrophic Lateral Sclerosis and Healthy Subjects , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).