An improved method for voice pathology detection by means of a HMM-based feature space transformation

This paper presents new a feature transformation technique applied to improve the screening accuracy for the automatic detection of pathological voices. The statistical transformation is based on Hidden Markov Models, obtaining a transformation and classification stage simultaneously and adjusting the parameters of the model with a criterion that minimizes the classification error. The original feature vectors are built up using classic short-term noise parameters and mel-frequency cepstral coefficients. With respect to conventional approaches found in the literature of automatic detection of pathological voices, the proposed feature space transformation technique demonstrates a significant improvement of the performance with no addition of new features to the original input space. In view of the results, it is expected that this technique could provide good results in other areas such as speaker verification and/or identification.

[1]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[2]  Minsoo Hahn,et al.  Classification of Pathological and Normal Voice Based on Linear Discriminant Analysis , 2007, ICANNGA.

[3]  K. Kim,et al.  Face recognition using kernel principal component analysis , 2002, IEEE Signal Process. Lett..

[4]  Tim Ritchings,et al.  Pathological voice quality assesment using artificial neural networks , 2001, MAVEBA.

[5]  J. S. Bridle,et al.  An Alphanet approach to optimising input transformations for continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Carlos Dias Maciel,et al.  Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders , 2007, Comput. Biol. Medicine.

[7]  Stefan Hadjitodorov,et al.  A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. , 2002, Medical engineering & physics.

[8]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[9]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[10]  Thierry Dutoit,et al.  On the Use of the Correlation between Acoustic Descriptors for the Normal/Pathological Voices Discrimination , 2009, EURASIP J. Adv. Signal Process..

[11]  Manuel Blanco-Velasco,et al.  Effects of Audio Compression in Automatic Detection of Voice Pathologies , 2008, IEEE Transactions on Biomedical Engineering.

[12]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[13]  Per Ödling,et al.  Analysis of Adaptive Interference Cancellation Using Common-Mode Information in Wireline Communications , 2007, EURASIP J. Adv. Signal Process..

[14]  María Victoria Rodellar Biarge,et al.  Principal component analysis of spectral perturbation parameters for voice pathology detection , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[15]  Stefan Todorov Hadjitodorov,et al.  Laryngeal pathology detection by means of class-specific neural maps , 2000, IEEE Transactions on Information Technology in Biomedicine.

[16]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[17]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[18]  Antanas Verikas,et al.  Automated speech analysis applied to laryngeal disease categorization , 2008, Comput. Methods Programs Biomed..

[19]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[20]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[21]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[22]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[23]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[24]  Daming Wei,et al.  SVM-based Identification of Pathological Voices , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[25]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[26]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[27]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[28]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[29]  Ioannis Pitas,et al.  Automatic detection of vocal fold paralysis and edema , 2004, INTERSPEECH.

[30]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[31]  D Michaelis,et al.  Selection and combination of acoustic features for the description of pathologic voices. , 1998, The Journal of the Acoustical Society of America.

[32]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.

[33]  E. Yumoto,et al.  Harmonics-to-noise ratio and psychophysical measurement of the degree of hoarseness. , 1984, Journal of speech and hearing research.

[34]  Kumara Shama,et al.  Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology , 2007, EURASIP J. Adv. Signal Process..

[35]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[36]  T.W. Berger,et al.  Pathological Voice Assessment , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[37]  Yik-Cheung Tam,et al.  Discriminative auditory-based features for robust speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[38]  S. Feijóo,et al.  Short-term stability measures for the evaluation of vocal quality. , 1990, Journal of speech and hearing research.

[39]  Jack J Jiang,et al.  Chaos in voice, from modeling to measurement. , 2006, Journal of voice : official journal of the Voice Foundation.

[40]  R. Fraile,et al.  Automatic Detection of Laryngeal Pathologies in Records of Sustained Vowels by Means of Mel-Frequency Cepstral Coefficient Parameters and Differentiation of Patients by Sex , 2009, Folia Phoniatrica et Logopaedica.

[41]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[42]  Miguel Angel Ferrer-Ballester,et al.  Automatic Detection of Pathologies in The Voice by HOS Based Parameters , 2001, EURASIP J. Adv. Signal Process..

[43]  F. Klingholz,et al.  The measurement of the signal-to-noise ratio (SNR) in continuous speech , 1987, Speech Commun..

[44]  Daming Wei,et al.  Pathological Voice Classification Based on a Single Vowel's Acoustic Features , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[45]  Neil D. Lawrence,et al.  Accounting for probe-level noise in principal component analysis of microarray data , 2005, Bioinform..

[46]  Antonio M. Peinado,et al.  Discriminative feature weighting for HMM-based continuous speech recognizers , 2002, Speech Commun..

[47]  Roland Linder,et al.  Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. , 2008, Journal of voice : official journal of the Voice Foundation.

[48]  Hugo Leonardo Rufiner,et al.  Dimensionality reduction for visualization of normal and pathological speech data , 2009, Biomed. Signal Process. Control..

[49]  David G. Stork,et al.  Pattern Classification , 1973 .

[50]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[51]  Claudia Manfredi,et al.  Adaptive noise energy estimation in pathological speech signals , 2000, IEEE Transactions on Biomedical Engineering.

[52]  Kuldip K. Paliwal,et al.  Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition , 2003, Pattern Recognit..

[53]  Dimitar D. Deliyski,et al.  Acoustic model and evaluation of pathological voice production , 1993, EUROSPEECH.

[54]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[55]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.