Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders - Step 1: CNN Model-Based Phone Classification

Perceptual measurement is still the most common method for assessing disordered speech in clinical practice. The subjectivity of such a measure, strongly due to human nature, but also to its lack of interpretation with regard to local alterations in speech units, strongly motivates a sophisticated tool for objective evaluation. Of interest is the increasing performance of Deep Neural Networks in speech applications, but more importantly the fact that they are no longer considered as black boxes. The work carried out here is the first step in a long-term research project, which aims to determine the linguistic units that contribute most to the maintenance or loss of the intelligibility in speech disorders. In this context, we study a CNN trained on normal speech for a classification task of phones and tested on pathological speech. The aim of this first study is to analyze the response of the CNN model to disordered speech in order to study later its effectiveness in providing relevant knowledge in terms of speech severity or loss of intelligibility. Compared to perceptual severity and intelligibility measures, the results revealed a very strong correlation between these metrics and our classifier performance scores, which is very promising for future work.

[1]  Elmar Nöth,et al.  Multimodal Assessment of Parkinson's Disease: A Deep Learning Approach , 2019, IEEE Journal of Biomedical and Health Informatics.

[2]  Anja Lowit,et al.  Assessment of Motor Speech Disorders , 2010 .

[3]  Julie Mauclair,et al.  Carcinologic Speech Severity Index Project: A Database of Speech Disorder Productions to Assess Quality of Life Related to Speech After Cancer , 2018, LREC.

[4]  Raymond D. Kent,et al.  Intelligibility in speech disorders : theory, measurement, and management , 1992 .

[5]  John J. Soraghan,et al.  Automatic detection of speech disorder in dysarthria using extended speech feature extraction and neural networks classification , 2017 .

[6]  Tasha Nagamine,et al.  Exploring how deep neural networks form phonemic categories , 2015, INTERSPEECH.

[7]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[8]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[9]  Corinne Fredouille,et al.  Design and Development of a Speech Intelligibility Test Based on Pseudowords in French: Why and How? , 2020, Journal of speech, language, and hearing research : JSLHR.

[10]  Daniel Aalto,et al.  AUTOMATIC SPEECH INTELLIGIBILITY SCORING OF HEAD AND NECK CANCER PATIENTS WITH DEEP NEURAL NETWORKS , 2019 .

[11]  Thomas Pellegrini,et al.  Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques , 2016, INTERSPEECH.

[12]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[13]  Sunil Kumar Kopparapu,et al.  Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition , 2017, INTERSPEECH.