论文信息 - Raw Speech Waveform Based Classification of Patients with ALS, Parkinson's Disease and Healthy Controls Using CNN-BLSTM

Raw Speech Waveform Based Classification of Patients with ALS, Parkinson's Disease and Healthy Controls Using CNN-BLSTM

Analysis of speech waveform through automated methods in patients with Amyotrophic Lateral Sclerosis (ALS), and Parkinson’s disease (PD) can be used for early diagnosis and monitoring disease progression. Many works in the past have used different acoustic features for the classification of patients with ALS and PD with healthy controls (HC). In this work, we propose a data-driven approach to learn representations from raw speech waveform. Our model comprises of 1-D CNN layer to extract representations from raw speech followed by BLSTM layers for the classification tasks. We consider 3 different classification tasks (ALS vs HC), (PD vs HC), and (ALS vs PD). We perform each classification task using four different speech stimuli in two scenarios: i) trained and tested in a stimulusspecific manner, ii) trained on data pooled from all stimuli, and test on each stimulus separately. Experiments with 60 ALS, 60 PD, and 60 HC show that the frequency responses of the learned 1-D CNN filters are low pass in nature, and the center frequencies lie below 1kHz. The learned representations form raw speech perform better than MFCC which is considered as baseline. Experiments with pooled models yield a better result compared to the task-specific models.

[1] Jordan R. Green,et al. Speaking rate effects on articulatory pattern consistency in talkers with mild ALS , 2014, Clinical linguistics & phonetics.

[2] Nicolas Usunier,et al. End-to-End Speech Recognition From the Raw Waveform , 2018, INTERSPEECH.

[3] E. Katunina,et al. [Epidemiology of Parkinson's disease]. , 2013, Zhurnal nevrologii i psikhiatrii imeni S.S. Korsakova.

[4] A. Albanese. Diagnostic criteria for Parkinson's disease , 2003, Neurological Sciences.

[5] J. Cedarbaum,et al. The ALSFRS-R: a revised ALS functional rating scale that incorporates assessments of respiratory function , 1999, Journal of the Neurological Sciences.

[6] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.

[7] M. Swash,et al. El Escorial revisited: Revised criteria for the diagnosis of amyotrophic lateral sclerosis , 2000, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[8] Markus Neuhäuser,et al. Wilcoxon Signed Rank Test , 2006 .

[9] Kai Yu,et al. End-to-end spoofing detection with raw waveform CLDNNS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] Raymond D. Kent,et al. Speech deterioration in amyotrophic lateral sclerosis: a case study. , 1991, Journal of speech and hearing research.

[11] R. Arceci,et al. Prognostic factors and risk-based therapy in pediatric acute myeloid leukemia , 2003, Current oncology reports.

[12] Yana Yunusova,et al. Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated Assessment , 2017, INTERSPEECH.

[13] Deepti Singh,et al. Voice activity detection , 2007, CROS.

[14] A Nalini,et al. Early or late appearance of “dropped head syndrome” in amyotrophic lateral sclerosis , 2003, Journal of neurology, neurosurgery, and psychiatry.

[15] E. Beghi,et al. Prognostic factors in ALS: A critical review , 2009, Amyotrophic lateral sclerosis : official publication of the World Federation of Neurology Research Group on Motor Neuron Diseases.

[16] Silvia Orlandi,et al. Automatic Detection of Amyotrophic Lateral Sclerosis (ALS) from Video-Based Analysis of Facial Movements: Speech and Non-Speech Tasks , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[17] Prasanta Kumar Ghosh,et al. Representation Learning Using Convolution Neural Network for Acoustic-to-articulatory Inversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] A. Chiò,et al. Projected increase in amyotrophic lateral sclerosis from 2015 to 2040 , 2016, Nature Communications.

[19] Prasanta Kumar Ghosh,et al. Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Erika S. Levy,et al. Parkinson’s disease-associated dysarthria: prevalence, impact and management strategies , 2019, Research and Reviews in Parkinsonism.

[21] Sébastien Marcel,et al. Towards Directly Modeling Raw Speech Signal for Speaker Verification Using CNNS , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Sanjeev Khudanpur,et al. Acoustic Modelling from the Signal Domain Using CNNs , 2016, INTERSPEECH.

[23] Prasanta Kumar Ghosh,et al. Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis , 2019, INTERSPEECH.

[24] K. Thennarasu,et al. Clinical characteristics and survival pattern of 1153 patients with amyotrophic lateral sclerosis: Experience over 30 years from India , 2008, Journal of the Neurological Sciences.