Automatic Detection of Laryngeal Pathologies in Records of Sustained Vowels by Means of Mel-Frequency Cepstral Coefficient Parameters and Differentiation of Patients by Sex

Mel-frequency cepstral coefficients (MFCC) have traditionally been used in speaker identification applications. Their use has been extended to speech quality assessment for clinical applications during the last few years. While the significance of such parameters for such an application may not seem clear at first thought, previous research has demonstrated their robustness and statistical significance and, at the same time, their close relationship with glottal noise measurements. This paper includes a review of this parameterization scheme and it analyzes its performance for voice analysis when patients are differentiated by sex. While it is of common use for establishing normative values for traditional voice descriptors (e.g. pitch, jitter, formants), differentiation by sex had not been tested yet for cepstral analysis of voice with clinical purposes. This paper shows that the automatic detection of laryngeal pathology on voice records based on MFCC can significantly improve its performance by means of this prior differentiation by sex.

[1]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[2]  Stefan Hadjitodorov,et al.  Robust hybrid pitch detector , 1993 .

[3]  Anders Lindhe,et al.  VOICE ERGONOMICS - AN OVERVIEW OF RECENT RESEARCH , 2007 .

[4]  L. Collet,et al.  Acoustical recognition of laryngeal pathology using the fundamental frequency and the first three formants of vowels , 1997, Medical and Biological Engineering and Computing.

[5]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[6]  Miguel Angel Ferrer-Ballester,et al.  Automatic Detection of Pathologies in The Voice by HOS Based Parameters , 2001, EURASIP J. Adv. Signal Process..

[7]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[8]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[9]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[10]  Pedro Gómez-Vilda,et al.  An integrated tool for the diagnosis of voice disorders , 2006 .

[11]  A Giovanni,et al.  Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements. , 2001, Journal of voice : official journal of the Voice Foundation.

[12]  J Lindström,et al.  Acoustic and perceptual evaluation of voice and speech quality: a study of patients with laryngeal cancer treated with laryngectomy vs irradiation. , 1999, Archives of otolaryngology--head & neck surgery.

[13]  S. Linville,et al.  Vocal tract resonance analysis of aging voice using long-term average spectra. , 2001, Journal of voice : official journal of the Voice Foundation.

[14]  María Cristina A. Jackson Menaldi La voz patológica , 2002 .

[15]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[16]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[17]  Karthikeyan Umapathy,et al.  Feature analysis of pathological speech signals using local discriminant bases technique , 2006, Medical and Biological Engineering and Computing.

[18]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[19]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.

[20]  Jean-François Bonastre,et al.  Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia) , 2005, INTERSPEECH.

[21]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[22]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[23]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[24]  E. Yiu,et al.  Suitability of Acoustic Perturbation Measures in Analysing Periodic and Nearly Periodic Voice Signals , 2005, Folia Phoniatrica et Logopaedica.

[25]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[26]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[27]  Dimitar D. Deliyski,et al.  Acoustic model and evaluation of pathological voice production , 1993, EUROSPEECH.

[28]  Juan Ignacio Godino-Llorente,et al.  MFCC-based Remote Pathology Detection on Speech Transmitted Through the Telephone Channel - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise , 2009, BIOSIGNALS.

[29]  Pedro Gómez Vilda,et al.  Use of Mel Frequency Cepstral Coefficients for Automatic Pathology Detection on Sustained Vowel Phonations: Mathematical and Statistical Justification , 2008 .

[30]  Shrikanth Narayanan,et al.  Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[31]  A. Jones,et al.  American Academy of Otolaryngology-Head and Neck Surgery Foundation, Inc. , 2004 .

[32]  Peter J. Murphy,et al.  Quantification of glottal and voiced speech harmonics-to-noise ratios using cepstral-based estimation , 2005 .

[33]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[34]  Jean-François Bonastre,et al.  Frequency study for the characterization of the dysphonic voices , 2007, INTERSPEECH.

[35]  J. Švec,et al.  Comparison of Acceleration and Impact Stress as Possible Loading Factors in Phonation: A Computer Modeling Study , 2009, Folia Phoniatrica et Logopaedica.

[36]  M. Wieringa,et al.  Influence of Age and Gender on the Dysphonia Severity Index , 2006, Folia Phoniatrica et Logopaedica.

[37]  Jean-François Bonastre,et al.  Complementary approaches for voice disorder assessment , 2007, INTERSPEECH.