An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features

Abstract In this work, we are interested in developing an efficient voice disorders classification system by using discrete wavelet packet transform (DWPT), multi-class linear discriminant analysis (MC-LDA), and multilayer neural network (ML-NN). The characteristics of normal and pathologic voices are well described with energy and Shannon entropy extracted from the coefficients in the output nodes of the best wavelet packet tree with eight decomposition level. The separately extracted wavelet packet-based features, energy and Shannon entropy, are optimized with the usage of multi-class linear discriminant analysis to reduced 2-dimensional feature vector. The experimental implementation uses 258 data samples including normal voices and speech signals impaired by three sorts of disorders: A–P squeezing, gastric reflux, and hyperfunction. The voice disorders classification results achieved on Kay Elemetrics databases, developed by Massachusetts Ear and Eye Infirmary (MEEI), show average classification accuracy of 96.67% and 97.33% for the structure composed of wavelet packet-based energy and entropy features, respectively. In these structures, feature vectors are optimized by multi-class linear discriminant analysis and, finally classified by multilayer neural network. The obtained results from confusion matrix and cross-validation tests prove that this novel voice pathology classification system is capable of significant classification improvement with low complexity. This research claims that the proposed voice pathology classification tool can be employed for application of early detection of laryngeal pathology and for assessment of vocal improvement following voice therapy in clinical setting.

[1]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[2]  Joseph C. Stemple,et al.  Clinical Voice Pathology: Theory and Management , 1984 .

[3]  Mohammad Pooyan,et al.  Identification of voice disorders using long-time features and support vector machine with different feature reduction methods. , 2011, Journal of voice : official journal of the Voice Foundation.

[4]  Mohammad Pooyan,et al.  An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine , 2012, Biomed. Signal Process. Control..

[5]  Dimitar D. Deliyski,et al.  Acoustic model and evaluation of pathological voice production , 1993, EUROSPEECH.

[6]  Joseana Macêdo Fechine,et al.  Pathological voice discrimination using cepstral analysis, vector quantization and Hidden Markov Models , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[7]  David J Perkel,et al.  A novel model for examining recovery of phonation after vocal nerve damage. , 2011, Journal of voice : official journal of the Voice Foundation.

[8]  Karthikeyan Umapathy,et al.  Feature analysis of pathological speech signals using local discriminant bases technique , 2006, Medical and Biological Engineering and Computing.

[9]  R. Guido,et al.  Trying different wavelets on the search for voice disorders sorting , 2005, Proceedings of the Thirty-Seventh Southeastern Symposium on System Theory, 2005. SSST '05..

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[12]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[13]  James R. Glass,et al.  A wavelet and filter bank framework for phonetic classification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Farshad Almasganj,et al.  Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients' speech signal with unilateral vocal fold paralysis , 2007, Comput. Biol. Medicine.

[15]  Babak Seyed Aghazadeh,et al.  Optimal feature selection for the assessment of vocal fold disorders , 2009, Comput. Biol. Medicine.

[16]  Marcelo de Oliveira Rosa,et al.  Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[17]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[18]  Rajendra U Acharya,et al.  Classification and analysis of speech abnormalities , 2005 .

[19]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[20]  Metin Akay,et al.  Time frequency and wavelets in biomedical signal processing , 1998 .

[21]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[22]  Pedro Gómez-Vilda,et al.  The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders. , 2010, Journal of voice : official journal of the Voice Foundation.

[23]  E. Yumoto,et al.  Harmonics-to-noise ratio and psychophysical measurement of the degree of hoarseness. , 1984, Journal of speech and hearing research.

[24]  Stefan Hadjitodorov,et al.  A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. , 2002, Medical engineering & physics.

[25]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[26]  J. O. Wisbeck,et al.  Dysphonic voice classification using wavelet packet transform and artificial neural network , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[27]  Stefan Todorov Hadjitodorov,et al.  Laryngeal pathology detection by means of class-specific neural maps , 2000, IEEE Transactions on Information Technology in Biomedicine.

[28]  Hai Jiang,et al.  Feature extraction using wavelet packets strategy , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[29]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[30]  B. Meyer,et al.  A fractal approach to normal and pathological voices. , 2000, Acta oto-laryngologica.

[31]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[32]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[33]  Raymond H. Colton,et al.  Understanding Voice Problems , 1990 .

[34]  W S Winholtz,et al.  Vocal tremor analysis with the Vocal Demodulator. , 1992, Journal of speech and hearing research.

[35]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[36]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[37]  L. Gavidia-Ceballos,et al.  Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection , 1996, IEEE Transactions on Biomedical Engineering.

[38]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[39]  Max A. Little,et al.  Nonlinear, Biophysically-Informed Speech Pathology Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.