论文信息 - Voice Pathology Detection Using Deep Learning: a Preliminary Study

Voice Pathology Detection Using Deep Learning: a Preliminary Study

This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional layers in combination with recurrent Long-Short-Term-Memory (LSTM) layers on raw audio signal. Each recording was split into 64 ms Hamming windowed segments with 30 ms overlap. Our trained model achieved 71.36% accuracy with 65.04% sensitivity and 77.67% specificity on 206 validation files and 68.08% accuracy with 66.75% sensitivity and 77.89% specificity on 874 testing files. This is a promising result in favor of this approach because it is comparable to similar previously published experiment that used different methodology. Further investigation is needed to achieve the state-of-the-art results.

[1] M. Shamim Hossain,et al. Healthcare Big Data Voice Pathology Assessment Framework , 2016, IEEE Access.

[2] Athanasios V. Vasilakos,et al. Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix , 2017, Sensors.

[3] Muhammad Ghulam,et al. Voice pathology detection using interlaced derivative pattern on glottal source excitation , 2017, Biomed. Signal Process. Control..

[4] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5] Adnane Cherif,et al. Speech recognition system based on short-term cepstral parameters, feature reduction method and Artificial Neural Networks , 2016, 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[6] Ömer Eskidere,et al. Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features , 2015, Comput. Math. Methods Medicine.

[7] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[9] Eduardo Lleida,et al. Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit , 2012, IberSPEECH.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12] Muhammad Ghulam,et al. Voice pathology detection using auto-correlation of different filters bank , 2014, 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA).

[13] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[14] Marcos Faúndez-Zanuy,et al. Robust and complex approach of pathological speech signal analysis , 2015, Neurocomputing.

[15] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16] Adnane Cherif,et al. Dimensionality reduction for voice disorders identification system based on Mel Frequency Cepstral Coefficients and Support Vector Machine , 2015, 2015 7th International Conference on Modelling, Identification and Control (ICMIC).

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Bogdan Woldert-Jokisz,et al. Saarbruecken Voice Database , 2007 .

[19] Andrzej Skalski,et al. Voice data mining for laryngeal pathology assessment , 2016, Comput. Biol. Medicine.

[20] Ghulam Muhammad,et al. An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification. , 2017, Journal of voice : official journal of the Voice Foundation.

[21] Ghulam Muhammad,et al. Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions. , 2017, Journal of voice : official journal of the Voice Foundation.