Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features

Voice disorders could increase unhealthy social behavior and voice abuse, and dramatically affect the patients’ quality of life. Therefore, automatic intelligibility detection of pathological voices has an important role in the opportune treatment of pathological voices. This paper aims at designing an intelligibility detection system which is characterized by two aspects. First, the system is based on features inspired from voice pathology such as voice quality features, spectral and harmonicity features, and hierarchical features. Second, the intelligibility detection is based on fusion of the individual linear dimensionality reductions such as asymmetric sparse partial least squares (ASPLS) trained by different sets of normalized features. Experimental results show that our method achieves accuracy of 71.88% on the unweighted recall value on the test set, an improvement of 2.98% absolute (4.33% relative) gain over the baseline model accuracy of 68.9%.

[1]  Jean-François Bonastre,et al.  Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia) , 2005, INTERSPEECH.

[2]  Annemieke H Ackerstaff,et al.  Pretreatment organ function in patients with advanced head and neck cancer: clinical outcome measures and patients' views , 2009, BMC ear, nose, and throat disorders.

[3]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[4]  Kaare Brandt Petersen,et al.  Sparse Kernel Orthonormalized PLS for feature extraction in large data sets , 2006, NIPS.

[5]  Shuzhi Sam Ge,et al.  Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines , 2011, INTERSPEECH.

[6]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[7]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[8]  Björn W. Schuller,et al.  Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[11]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[12]  D Michaelis,et al.  Selection and combination of acoustic features for the description of pathologic voices. , 1998, The Journal of the Acoustical Society of America.

[13]  Angeliki Metallinou,et al.  Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).