Voice Pathology Detection Using Deep Learning: a Preliminary Study

This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional layers in combination with recurrent Long-Short-Term-Memory (LSTM) layers on raw audio signal. Each recording was split into 64 ms Hamming windowed segments with 30 ms overlap. Our trained model achieved 71.36% accuracy with 65.04% sensitivity and 77.67% specificity on 206 validation files and 68.08% accuracy with 66.75% sensitivity and 77.89% specificity on 874 testing files. This is a promising result in favor of this approach because it is comparable to similar previously published experiment that used different methodology. Further investigation is needed to achieve the state-of-the-art results.

[1]  M. Shamim Hossain,et al.  Healthcare Big Data Voice Pathology Assessment Framework , 2016, IEEE Access.

[2]  Athanasios V. Vasilakos,et al.  Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix , 2017, Sensors.

[3]  Muhammad Ghulam,et al.  Voice pathology detection using interlaced derivative pattern on glottal source excitation , 2017, Biomed. Signal Process. Control..

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Adnane Cherif,et al.  Speech recognition system based on short-term cepstral parameters, feature reduction method and Artificial Neural Networks , 2016, 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[6]  Ömer Eskidere,et al.  Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features , 2015, Comput. Math. Methods Medicine.

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[9]  Eduardo Lleida,et al.  Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit , 2012, IberSPEECH.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Muhammad Ghulam,et al.  Voice pathology detection using auto-correlation of different filters bank , 2014, 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA).

[13]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[14]  Marcos Faúndez-Zanuy,et al.  Robust and complex approach of pathological speech signal analysis , 2015, Neurocomputing.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Adnane Cherif,et al.  Dimensionality reduction for voice disorders identification system based on Mel Frequency Cepstral Coefficients and Support Vector Machine , 2015, 2015 7th International Conference on Modelling, Identification and Control (ICMIC).

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Bogdan Woldert-Jokisz,et al.  Saarbruecken Voice Database , 2007 .

[19]  Andrzej Skalski,et al.  Voice data mining for laryngeal pathology assessment , 2016, Comput. Biol. Medicine.

[20]  Ghulam Muhammad,et al.  An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification. , 2017, Journal of voice : official journal of the Voice Foundation.

[21]  Ghulam Muhammad,et al.  Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions. , 2017, Journal of voice : official journal of the Voice Foundation.