Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients

This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001.

[1]  I. Titze The myoelastic aerodynamic theory of phonation , 2006 .

[2]  Yuhong Yang Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas , 2008 .

[3]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Christopher J. Moore,et al.  Quantifying aberrant phonation using approximate entropy in electrolaryngography , 2005, Speech Commun..

[5]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[6]  A. Giovanni,et al.  Determination of largest Lyapunov exponents of vocal signal: application to unilateral laryngeal paralysis. , 1999, Journal of voice : official journal of the Voice Foundation.

[7]  D. Levy,et al.  Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics. , 1997, Circulation.

[8]  Jack J. Jiang,et al.  Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. , 2005, Journal of voice : official journal of the Voice Foundation.

[9]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[10]  L. Cao Practical method for determining the minimum embedding dimension of a scalar time series , 1997 .

[11]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[12]  Jack J. Jiang,et al.  Describing pediatric dysphonia with nonlinear dynamic parameters. , 2008, International journal of pediatric otorhinolaryngology.

[13]  Jack J. Jiang,et al.  Nonlinear dynamic analysis of voices before and after surgical excision of vocal polyps. , 2004, The Journal of the Acoustical Society of America.

[14]  Antoine Giovanni,et al.  Normal voice in children between 6 and 12 years of age: database and nonlinear analysis. , 2008, Journal of voice : official journal of the Voice Foundation.

[15]  A. Wolf,et al.  Determining Lyapunov exponents from a time series , 1985 .

[16]  Miguel Angel Ferrer-Ballester,et al.  Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[18]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[19]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[20]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[21]  Madalena Costa,et al.  Multiscale entropy analysis of biological signals. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  秦 浩起,et al.  Characterization of Strange Attractor (カオスとその周辺(基研長期研究会報告)) , 1987 .

[23]  A. Darbyshire Calculating Lyapunov exponents from a time series , 1994 .

[24]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[25]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[26]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[27]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[28]  Mohammad Rezaeian Hidden Markov Process: A New Representation, Entropy Rate and Estimation Entropy , 2006, ArXiv.

[29]  R. Burton,et al.  Consistency of the Takens estimator for the correlation dimension , 1999 .

[30]  Mingzhou Ding,et al.  Estimating correlation dimension from a chaotic time series: when does plateau onset occur? , 1993 .

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  Ian T. Nabney,et al.  A new entropy measure based on the Renyi entropy rate using Gaussian kernels , 2006 .

[33]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007 .

[34]  German Castellanos-Dominguez,et al.  Complexity analysis of pathological voices by means of hidden markov entropy measurements , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[35]  Farshad Almasganj,et al.  Pathological assessment of patients' speech signals using nonlinear dynamical analysis , 2010, Comput. Biol. Medicine.

[36]  Carlos Dias Maciel,et al.  Identifying healthy and pathologically affected voice signals [Lecture notes] , 2010, IEEE Signal Processing Magazine.

[37]  Steven M. Pincus,et al.  Approximate Entropy of Heart Rate as a Correlate of Postoperative Ventricular Dysfunction , 1993, Anesthesiology.

[38]  Jack J. Jiang,et al.  Acoustic analyses of sustained and running voices from patients with laryngeal pathologies. , 2008, Journal of voice : official journal of the Voice Foundation.

[39]  N. Slevin,et al.  Spectral pattern complexity analysis and the quantification of voice normality in healthy and radiotherapy patient groups. , 2004, Medical engineering & physics.

[40]  Jack J Jiang,et al.  Acoustic analysis of aperiodic voice: perturbation and nonlinear dynamic properties in esophageal phonation. , 2009, Journal of voice : official journal of the Voice Foundation.

[41]  Carlos Dias Maciel,et al.  Analysis of Voice Pathology Evolution Using Entropy Rate , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[42]  Lu Wang,et al.  Gaussian kernel approximate entropy algorithm for analyzing irregularity of time-series , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[43]  Niels Wessel,et al.  Correlation dimension analysis of heart rate variability in patients with dilated cardiomyopathy , 2005, Comput. Methods Programs Biomed..

[44]  Carlos Dias Maciel,et al.  Identifying Healthy and Pathologically Affected Voice Signals , 2010 .

[45]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[46]  Max A. Little,et al.  Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures , 2009 .

[47]  Jack J Jiang,et al.  Chaos in voice, from modeling to measurement. , 2006, Journal of voice : official journal of the Voice Foundation.

[48]  Jack J. Jiang,et al.  Nonlinear dynamic analysis in signal typing of pathological human voices , 2003 .

[49]  Yu Zhang,et al.  Nonlinear dynamic analysis of speech from pathological subjects , 2002 .

[50]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Apostolos Serletis,et al.  Effect of noise on estimation of Lyapunov exponents from a time series , 2007 .

[52]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[53]  P. Grassberger,et al.  Characterization of Strange Attractors , 1983 .

[54]  I. Rezek,et al.  Stochastic complexity measures for physiological signal analysis , 1998, IEEE Transactions on Biomedical Engineering.

[55]  Holger Kantz,et al.  Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.

[56]  M. Nikkhah-bahrami,et al.  Nonlinear Analysis and Classification of Vocal Disorders , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.