Automated speech analysis applied to laryngeal disease categorization

The long-term goal of the work is a decision support system for diagnostics of laryngeal diseases. Colour images of vocal folds, a voice signal, and questionnaire data are the information sources to be used in the analysis. This paper is concerned with automated analysis of a voice signal applied to screening of laryngeal diseases. The effectiveness of 11 different feature sets in classification of voice recordings of the sustained phonation of the vowel sound /a/ into a healthy and two pathological classes, diffuse and nodular, is investigated. A k-NN classifier, SVM, and a committee build using various aggregation options are used for the classification. The study was made using the mixed gender database containing 312 voice recordings. The correct classification rate of 84.6% was achieved when using an SVM committee consisting of four members. The pitch and amplitude perturbation measures, cepstral energy features, autocorrelation features as well as linear prediction cosine transform coefficients were amongst the feature sets providing the best performance. In the case of two class classification, using recordings from 79 subjects representing the pathological and 69 the healthy class, the correct classification rate of 95.5% was obtained from a five member committee. Again the pitch and amplitude perturbation measures provided the best performance.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[3]  Ying Li,et al.  Adaptive speaker identification with audiovisual cues for movie content analysis , 2004, Pattern Recognit. Lett..

[4]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[5]  Claudia Manfredi,et al.  Adaptive noise energy estimation in pathological speech signals , 2000, IEEE Transactions on Biomedical Engineering.

[6]  S N Awan,et al.  Improvements in estimating the harmonics-to-noise ratio of the voice. , 1994, Journal of voice : official journal of the Voice Foundation.

[7]  P. Nikkels,et al.  Benign Lesions of the Vocal Folds: Histopathology and Phonotrauma , 1995, The Annals of otology, rhinology, and laryngology.

[8]  Peter J. Murphy,et al.  Cepstrum-Based Estimation of the Harmonics-to-Noise Ratio for Synthesized and Human Voice Signals , 2005, NOLISP.

[9]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[10]  L Rufiner Hugo,et al.  Acoustic Analysis of Speech for Detection of Laryngeal Pathologies , 2000 .

[11]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[12]  Bayya Yegnanarayana,et al.  Speaker-specific mapping for text-independent speaker recognition , 2003, Speech Commun..

[13]  T Murry,et al.  Nomenclature of voice disorders and vocal pathology. , 2000, Otolaryngologic clinics of North America.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Benoît Maison,et al.  Audio-Visual Speaker Recognition for Video Broadcast News , 2001, J. VLSI Signal Process..

[16]  Cheng-Lin Liu,et al.  Classifier combination based on confidence transformation , 2005, Pattern Recognit..

[17]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[18]  Robert I. Damper,et al.  Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing , 2001, Int. J. Speech Technol..

[19]  John H. L. Hansen,et al.  A screening test for speech pathology assessment using objective quality measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.

[22]  Stefan Todorov Hadjitodorov,et al.  Laryngeal pathology detection by means of class-specific neural maps , 2000, IEEE Transactions on Information Technology in Biomedicine.

[23]  H. K. Schutte,et al.  Consistency of the preoperative and intraoperative diagnosis of benign vocal fold lesions. , 2003, Journal of voice : official journal of the Voice Foundation.

[24]  G. Friedrich,et al.  Phonosurgery of the vocal folds: a classification proposal , 2002, European Archives of Oto-Rhino-Laryngology.

[25]  Larry P. Heck,et al.  Robustness to telephone handset distortion in speaker recognition by discriminative feature design , 2000, Speech Commun..

[26]  Ravi P. Ramachandran,et al.  Cochannel speaker count labelling based on the use of cepstral and pitch prediction derived features , 2001, Pattern Recognit..

[27]  Q.Y. Hong,et al.  A discriminative training approach for text-independent speaker recognition , 2005, Signal Process..

[28]  Daniel J. Mashao,et al.  Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..

[29]  Shrikanth Narayanan,et al.  Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[30]  M. Bacauskiene,et al.  Multiple feature sets based categorization of laryngeal images , 2007, Comput. Methods Programs Biomed..

[31]  Richard J. Mammone,et al.  Speaker recognition - general classifier approaches and data fusion methods , 2002, Pattern Recognit..

[32]  Marcelo de Oliveira Rosa,et al.  Adaptive estimation of residue signal for voice pathology diagnosis , 2000, IEEE Trans. Biomed. Eng..

[33]  H.L. Rufiner,et al.  Acoustic analysis of speech for detection of laryngeal pathologies , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).

[34]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[35]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[36]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[37]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[38]  H. Gilbert,et al.  Perceptual and acoustic evaluation of individuals with laryngopharyngeal reflux pre- and post-treatment. , 2003, Journal of voice : official journal of the Voice Foundation.

[39]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[40]  Kuldip K. Paliwal,et al.  Identity verification using speech and face information , 2004, Digit. Signal Process..

[41]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[42]  Philip de Chazal,et al.  Telephony-based voice pathology assessment using automated speech analysis , 2006, IEEE Transactions on Biomedical Engineering.

[43]  C Manfredi,et al.  A comparative analysis of fundamental frequency estimation methods with application to pathological voices. , 2000, Medical engineering & physics.

[44]  Antanas Verikas,et al.  Soft combination of neural classifiers: A comparative study , 1999, Pattern Recognit. Lett..

[45]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[46]  Virgilijus Uloza,et al.  Multidimensional assessment of functional outcomes of medialization thyroplasty , 2005, European Archives of Oto-Rhino-Laryngology and Head & Neck.

[47]  Antanas Verikas,et al.  Fusing Neural Networks Through Space Partitioning and Fuzzy Integration , 2002, Neural Processing Letters.

[48]  Antanas Verikas,et al.  Towards a computer-aided diagnosis system for vocal cord diseases , 2006, Artif. Intell. Medicine.

[49]  Volker Tresp,et al.  Averaging Regularized Estimators , 1997, Neural Computation.

[50]  Claudia Manfredi,et al.  A new insight into postsurgical objective voice quality evaluation: application to thyroplastic medialization , 2006, IEEE Transactions on Biomedical Engineering.

[51]  Stefan Hadjitodorov,et al.  A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. , 2002, Medical engineering & physics.

[52]  T. Ananthakrishna,et al.  k-means nearest neighbor classifier for voice pathology , 2004, Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004..

[53]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[54]  Sam Kwong,et al.  A genetic classification method for speaker recognition , 2005, Eng. Appl. Artif. Intell..