Deep Learning Solution for Pathological Voice Detection using LSTM-based Autoencoder Hybrid with Multi-Task Learning

In this paper, a deep learning approach is introduced to detect pathological voice disorders from continuous speech. Speech as bio-signal is getting more and more attention as a discriminant for different diseases. To exploit information in speech, a long-short term memory (LSTM) autoencoder hybrid with multi-task learning solution is proposed with spectrogram as input feature. Different speech databases (voice disorders, depression, Parkinson’s disease) are applied as evaluation datasets. Applicability of the method is demonstrated by obtaining accuracies 85% for Parkinson’s disease, 86% for dysphonia, and 90% for depression on test datasets. The advantage of this method is that it is fully data-driven, in the sense that it does not require special acoustic-phonetic preprocessing separately for the types of disease to be recognized. We believe that the applied method in this article can be used to other diseases as well and can be used for other

[1]  Jianfeng Zhao,et al.  Speech emotion recognition using deep 1D & 2D CNN LSTM networks , 2019, Biomed. Signal Process. Control..

[2]  Prasanta Kumar Ghosh,et al.  Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Alice Othmani,et al.  MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech , 2019, ArXiv.

[4]  Myung Jong Kim,et al.  Dysarthric Speech Recognition Using Convolutional LSTM Neural Network , 2018, INTERSPEECH.

[5]  Hugo Cordeiro,et al.  Continuous Speech Classification Systems for Voice Pathologies Identification , 2015, DoCEIS.

[6]  Gábor Kiss,et al.  Mono- and multi-lingual depression prediction based on speech processing , 2017, International Journal of Speech Technology.

[7]  Habib Benali,et al.  X-Vectors: New Quantitative Biomarkers for Early Parkinson's Disease Detection From Speech , 2020, Frontiers in Neuroinformatics.

[8]  M. Hoehn,et al.  Parkinsonism , 1967, Neurology.

[9]  György Szaszák,et al.  Artificial Neural Network and SVM based Voice Disorder Classification , 2019, 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[10]  Gábor Kiss,et al.  Comparison of read and spontaneous speech in case of automatic detection of depression , 2017, 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[11]  Shijin Wang,et al.  A deep autoencoder feature learning method for process pattern recognition , 2019, Journal of Process Control.

[12]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[13]  Jack J. Jiang,et al.  Acoustic analyses of sustained and running voices from patients with laryngeal pathologies. , 2008, Journal of voice : official journal of the Voice Foundation.

[14]  P. Belchior,et al.  Connected speech assessment in the early detection of Alzheimer’s disease and mild cognitive impairment: a scoping review , 2020, Aphasiology.

[15]  Sukhpal Kaur,et al.  Hyper-parameter optimization of deep learning model for prediction of Parkinson’s disease , 2020, Machine Vision and Applications.

[16]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[17]  Wenyao Xu,et al.  PDVocal: Towards Privacy-preserving Parkinson's Disease Detection using Non-speech Body Sounds , 2019, MobiCom.

[18]  Satrajit S. Ghosh,et al.  Automated assessment of psychiatric disorders using speech: A systematic review , 2019, Laryngoscope investigative otolaryngology.

[19]  Vicsi Klára,et al.  Voice Disorder Detection on the Basis of Continuous Speech , 2011 .

[20]  Radim Krupicka,et al.  MACHINE LEARNING USING SPEECH UTTERANCES FOR PARKINSON DISEASE DETECTION , 2018 .

[21]  Chung-Hsien Wu,et al.  Detection of mood disorder using speech emotion profiles and LSTM , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[22]  Mazin Abed Mohammed,et al.  Voice Pathology Detection and Classification Using Convolutional Neural Network Model , 2020, Applied Sciences.

[23]  Weisi Lin,et al.  Context-aware Deep Learning for Multi-modal Depression Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Hakan Gunduz,et al.  Deep Learning-Based Parkinson’s Disease Classification Using Vocal Feature Sets , 2019, IEEE Access.

[25]  Gábor Gosztolya,et al.  A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech , 2018, Current Alzheimer research.

[26]  João Paulo Teixeira,et al.  Transfer Learning with AudioSet to Voice Pathologies Identification in Continuous Speech , 2019, CENTERIS/ProjMAN/HCist.

[27]  Mohamed Jmaiel,et al.  DL4DED: Deep Learning for Depressive Episode Detection on Mobile Devices , 2019, ICOST.

[28]  Rinkle Rani,et al.  Diagnosis of Parkinson's Disease Using Principle Component Analysis and Deep Learning , 2019 .

[29]  Reda Alhajj,et al.  Fuzzy Classification Methods Based Diagnosis of Parkinson's disease from Speech Test Cases. , 2019, Current aging science.

[30]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[31]  Jesús Francisco Vargas-Bonilla,et al.  Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease , 2015, INTERSPEECH.

[32]  Klára Vicsi,et al.  Parkinson’s Disease Severity Estimation on Hungarian Speech Using Various Speech Tasks , 2019, 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[33]  Mansour Alsulaiman,et al.  A Practical Approach: Design and Implementation of a Healthcare Software for Screening of Dysphonic Patients , 2017, IEEE Access.

[34]  Vibhuti Gupta,et al.  Voice Disorder Detection Using Long Short Term Memory (LSTM) Model , 2018, ArXiv.