Dimensionality reduction for voice disorders identification system based on Mel Frequency Cepstral Coefficients and Support Vector Machine

Nowadays, due to the severe daily activities and vocal abuse, many diseases affect the mechanism of voice production which causes pathological voices. Therefore, the identification of voice diseases becomes a real challenge. In this context, the automatic speech recognition can provide great results as a complementary tool to other medical techniques. This paper proposes a reliable algorithm based on short-term cepstral parameters, Linear Discriminant Analysis (LDA) as dimensionality reduction method and Support Vector Machine (SVM) as classifier. A full comparative study is established and the system performance is evaluated in terms of accuracy, sensitivity, specificity, precision and Area Under Curve (AUC). Our findings demonstrate that the detection of voice disorders can be efficient using only the original Mel Frequency Cepstral Coefficients (MFCC) ignoring their first and second derivative.

[1]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[3]  Xiong Xiao,et al.  Robust speech features and acoustic models for speech recognition , 2009 .

[4]  Eduardo Lleida,et al.  Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit , 2012, IberSPEECH.

[5]  Isabelle Guyon,et al.  Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.

[6]  Carlos Dias Maciel,et al.  Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders , 2007, Comput. Biol. Medicine.

[7]  Vikrant Singh Tomar Discriminant Feature Space Transformation for Automatic Speech Recognition , 2010 .

[8]  Jiawei Han,et al.  Linear Discriminant Dimensionality Reduction , 2011, ECML/PKDD.

[9]  Jacques Koreman,et al.  A GERMAN DATABASE OF PATTERNS OF PATHOLOGICAL VOCAL FOLD VIBRATION , 1997 .

[10]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  V. Tiwari MFCC and its applications in speaker recognition , 2010 .