Exploring similarity-based classification of larynx disorders from human voice

In this paper identification of laryngeal disorders using cepstral parameters of human voice is researched. Mel-frequency cepstral coefficients (MFCCs), extracted from audio recordings of patient's voice, are further approximated, using various strategies (sampling, averaging, and clustering by Gaussian mixture model). The effectiveness of similarity-based classification techniques in categorizing such pre-processed data into normal voice, nodular, and diffuse vocal fold lesion classes is explored and schemes to combine binary decisions of support vector machines (SVMs) are evaluated. Most practiced RBF kernel was compared to several constructed custom kernels: (i) a sequence kernel, defined over a pair of matrices, rather than over a pair of vectors and calculating the kernelized principal angle (KPA) between subspaces; (ii) a simple supervector kernel using only means of patient's GMM; (iii) two distance kernels, specifically tailored to exploit covariance matrices of GMM and using the approximation of the Kullback-Leibler divergence from the Monte-Carlo sampling (KL-MCS), and the Kullback-Leibler divergence combined with the Earth mover's distance (KL-EMD) as similarity metrics. The sequence kernel and the distance kernels both outperformed the popular RBF kernel, but the difference is statistically significant only in the distance kernels case. When tested on voice recordings, collected from 410 subjects (130 normal voice, 140 diffuse, and 140 nodular vocal fold lesions), the KL-MCS kernel, using GMM with full covariance matrices, and the KL-EMD kernel, using GMM with diagonal covariance matrices, provided the best overall performance. In most cases, SVM reached higher accuracy than least squares SVM, except for common binary classification using distance kernels. The results indicate that features, modeled with GMM, and kernel methods, exploiting this information, is an interesting fusion of generative (probabilistic) and discriminative (hyperplane) models for similarity-based classification.

[1]  Joost van Doremalen Hierarchical Temporal Memory Networks for Spoken Digit Recognition , 2007 .

[2]  Wenming Zheng Class-Incremental Generalized Discriminant Analysis , 2006, Neural Computation.

[3]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[4]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[5]  A. Tversky,et al.  Similarity, separability, and the triangle inequality. , 1982, Psychological review.

[6]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[7]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Shingo Kuroiwa,et al.  Nonparametric Speaker Recognition Method Using Earth Mover's Distance , 2006, IEICE Trans. Inf. Syst..

[9]  Antanas Verikas,et al.  Automated speech analysis applied to laryngeal disease categorization , 2008, Comput. Methods Programs Biomed..

[10]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[11]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[12]  Jean-François Bonastre,et al.  Frequency study for the characterization of the dysphonic voices , 2007, INTERSPEECH.

[13]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[14]  Yonghong Yan,et al.  Discrimination between pathological and normal voices using GMM-SVM approach. , 2011, Journal of voice : official journal of the Voice Foundation.

[15]  Yannis Stylianou,et al.  Dysphonia detection based on modulation spectral features and cepstral coefficients , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Miguel Angel Ferrer-Ballester,et al.  Support Vector Machines Applied to the Detection of Voice Disorders , 2005, NOLISP.

[17]  Driss Matrouf,et al.  Applying SVMs and weight-based factor analysis to unsupervised adaptation for speaker verification , 2011, Comput. Speech Lang..