Pathological Voice Classification Using Deep Learning

Voice classification task deals with sequential data. This is well known that this type of data is well processed by a recurrent neural network. In this work, we showed that in case of longer sequence convolutional neural network can give better accuracy. Whereas the recurrent network suffers from vanishing gradient problem even with a complex model like Long Short-Term Memory(LSTM). To illustrate the method we used pathological voice detection task. It is a type of problem in human voice caused by the internal defect in the throat and very hard to detect. In this work, we experimented with low dimension feature to compare both models rather than focusing on improving the overall accuracy.

[1]  Paul Strauss,et al.  Clinical Measurement Of Speech And Voice , 2016 .

[2]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[3]  Waveform Analysis Using The Fourier Transform , 2003 .

[4]  Miguel Angel Ferrer-Ballester,et al.  Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Aslam Muhammad,et al.  A Speaker Identification System Using MFCC Features with VQ Technique , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[6]  S. Narayanan,et al.  A System for Automatic Detection of Pathological Speech , 2003 .

[7]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[8]  Ghulam Muhammad,et al.  Voice Pathology Detection and Classification Using Auto-Correlation and Entropy Features in Different Frequency Regions , 2018, IEEE Access.

[9]  W. Marsden I and J , 2012 .

[10]  Maria Markaki,et al.  Using modulation spectra for voice pathology detection and classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[13]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[14]  Z. Milutinovic Classification of voice pathology. , 1996, Folia Phoniatrica et Logopaedica.

[15]  P. van de Heyning,et al.  Test-retest study of the GRBAS scale: influence of experience and professional background on perceptual rating of voice quality. , 1997, Journal of voice : official journal of the Voice Foundation.

[16]  F. Almasganj,et al.  Local Discriminant Wavelet Packet Basis for Voice Pathology Classification , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.