Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters

Voice diseases have been increasing dramatically in recent times due mainly to unhealthy social habits and voice abuse. These diseases must be diagnosed and treated at an early stage, especially in the case of larynx cancer. It is widely recognized that vocal and voice diseases do not necessarily cause changes in voice quality as perceived by a listener. Acoustic analysis could be a useful tool to diagnose this type of disease. Preliminary research has shown that the detection of voice alterations can be carried out by means of Gaussian mixture models and short-term mel cepstral parameters complemented by frame energy together with first and second derivatives. This paper, using the F-Ratio and Fisher's discriminant ratio, will demonstrate that the detection of voice impairments can be performed using both mel cepstral vectors and their first derivative, ignoring the second derivative

[1]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[2]  Claudia Manfredi,et al.  Adaptive noise energy estimation in pathological speech signals , 2000, IEEE Transactions on Biomedical Engineering.

[3]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[4]  W S Winholtz,et al.  Vocal tremor analysis with the Vocal Demodulator. , 1992, Journal of speech and hearing research.

[5]  J.H.L. Hansen,et al.  A noninvasive technique for detecting hypernasal speech using a nonlinear operator , 1996, IEEE Transactions on Biomedical Engineering.

[6]  David M. Skapura,et al.  Neural networks - algorithms, applications, and programming techniques , 1991, Computation and neural systems series.

[7]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[8]  Tim Ritchings,et al.  Pathological voice quality assesment using artificial neural networks , 2001, MAVEBA.

[9]  Chiyomi Miyajima,et al.  Speaker identification using Gaussian mixture models based on multi-space probability distribution , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[11]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[12]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[13]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[14]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[15]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[16]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[17]  Marcelo de Oliveira Rosa,et al.  Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[18]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[19]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[20]  S. Feijóo,et al.  Short-term stability measures for the evaluation of vocal quality. , 1990, Journal of speech and hearing research.

[21]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[22]  Stefan Todorov Hadjitodorov,et al.  Laryngeal pathology detection by means of class-specific neural maps , 2000, IEEE Transactions on Information Technology in Biomedicine.

[23]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[24]  E. Yumoto,et al.  Harmonics-to-noise ratio and psychophysical measurement of the degree of hoarseness. , 1984, Journal of speech and hearing research.

[25]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[26]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[27]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[28]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[29]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[30]  L. Gavidia-Ceballos,et al.  Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection , 1996, IEEE Transactions on Biomedical Engineering.

[31]  Stefan Hadjitodorov,et al.  Robust hybrid pitch detector , 1993 .

[32]  Hideki Kasuya,et al.  An acoustic analysis of pathological voice and its application to the evaluation of laryngeal pathology , 1986, Speech Commun..

[33]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[34]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[35]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[36]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[37]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[38]  D. Childers,et al.  Detection of laryngeal function using speech and electroglottographic data , 1992, IEEE Transactions on Biomedical Engineering.

[39]  K. K. Paliwal Dimensionality reduction of the enhanced feature set for the HMM-based speech recognizer , 1992, Digit. Signal Process..

[40]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[41]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[42]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.