Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale

Despite the fact that perceptual evaluation is considered as a gold standard for assessing pathological voice quality, the considerably high inter- and intra-listeners variability associated with different perceptual ratings cannot be ignored. This is probably due to other confounding factors such as listeners’ perceptual bias, listeners’ experience and type of rating scale being used. Automatic objective assessment can serve as a useful tool for diagnosis of pathological voices. Acoustic analysis can be useful in determining severity of dysphonia. The present study aimed to develop a complementary automatic voice assessment system by using multidimensional acoustical measures based on the well-known GRBAS perceptual rating scale. A total of 65 dimensionality measures including traditional acoustic methods, MFCC, Glottal-to-Noise Excitation Methods and nonlinear dynamical analysis were used to compose a matrix of features. To reduce redundancy in features, four different feature extraction techniques were applied. The multiclass classification was carried out by means of RBF kernel-SVM and Extreme Learning Machine. The classification results were moderately correlated with GRBAS ratings of severity, with the best accuracy around 77.55 and 80.58 %, respectively. This suggests that such multidimensional acoustic analysis can be an appropriate assessment tool in determining the presence and severity of voice disorders.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[3]  Jack J Jiang,et al.  Acoustic analysis of aperiodic voice: perturbation and nonlinear dynamic properties in esophageal phonation. , 2009, Journal of voice : official journal of the Voice Foundation.

[4]  Sazali Yaacob,et al.  A hybrid expert system approach for telemonitoring of vocal fold pathology , 2013, Appl. Soft Comput..

[5]  J. Kreiman,et al.  Listener experience and perception of voice quality. , 1990 .

[6]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[7]  J. I. Godino-Llorente,et al.  Discriminative methods for the detection of voice disorders , 2005 .

[8]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Max A. Little,et al.  Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures , 2009 .

[11]  Ping Yu,et al.  Multidimensional acoustic analysis for voice quality assessment based on the GRBAS scale , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[12]  Miguel Angel Ferrer-Ballester,et al.  Support Vector Machines Applied to the Detection of Voice Disorders , 2005, NOLISP.

[13]  F. Cruz-Roldan,et al.  Automatic Assessment of Voice Quality According to the GRBAS Scale , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[16]  Petros Maragos,et al.  A Comparison of the Squared Energy and Teager-Kaiser Operators for Short-Term Energy Estimation in Additive Noise , 2009, IEEE Transactions on Signal Processing.

[17]  Paul Strauss,et al.  Clinical Measurement Of Speech And Voice , 2016 .

[18]  J. Kreiman,et al.  Individual differences in voice quality perception. , 1992 .

[19]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[20]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Javier Ferreiros,et al.  Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations , 1999, Speech Commun..

[22]  C R Rabinov,et al.  Comparing reliability of perceptual ratings of roughness and acoustic measure of jitter. , 1995, Journal of speech and hearing research.

[23]  M. Hirano,et al.  Clinical Examination of Voice , 1981 .

[24]  P. Van cauwenberge,et al.  Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. , 2010, Journal of voice : official journal of the Voice Foundation.

[25]  Roger Fletcher,et al.  Practical methods of optimization; (2nd ed.) , 1987 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Max A. Little,et al.  Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity , 2011, Journal of The Royal Society Interface.

[28]  Germán Castellanos-Domínguez,et al.  An improved method for voice pathology detection by means of a HMM-based feature space transformation , 2010, Pattern Recognit..

[29]  Maria Markaki,et al.  Using modulation spectra for voice pathology detection and classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[30]  Nan Yan,et al.  Nonlinear dynamical analysis of laryngeal, esophageal, and tracheoesophageal speech of Cantonese. , 2013, Journal of voice : official journal of the Voice Foundation.

[31]  D. L. Ratusnik,et al.  Acoustic and perceptual measurements of roughness influencing judgments of pitch. , 1988, The Journal of speech and hearing disorders.

[32]  David W. Lewis,et al.  Matrix theory , 1991 .

[33]  R. Fletcher Practical Methods of Optimization , 1988 .

[34]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[35]  A Giovanni,et al.  Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements. , 2001, Journal of voice : official journal of the Voice Foundation.