A methodology to evaluate pathological voice detection systems

This paper describes some methodological issues to be considered when designing systems for automatic detection of voice pathology, in order to allow comparisons with previous or future experiments. The proposed methodology is built around Kay Elemetrics voice disorders database, which is the only one commercially available. Discussion about key points on this database is included. Any experiment should have a cross-validation strategy, and results should supply, along with the final confusion matrix, confidence intervals for all measures. Detector performance curves such as DET plots are also considered. An example of the methodology is provided, with an experiment based on short-term parameters and Multi-layer Perceptrons.

[1]  D. Childers,et al.  Detection of laryngeal function using speech and electroglottographic data , 1992, IEEE Transactions on Biomedical Engineering.

[2]  G. de Krom,et al.  Consistency and reliability of voice quality ratings for different types of speech fragments. , 1994, Journal of speech and hearing research.

[3]  Mirjam Wester Automatic Classification of Voice Quality: Comparing Regression Models and Hidden Markov Models , 1998 .

[4]  Y Horii,et al.  Jitter and shimmer in sustained vocal fry phonation. , 1985, Folia phoniatrica.

[5]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[6]  E. Kruse,et al.  IMAGE SEQUENCES AS NECESSARY SUPPLEMENT TO A PATHOLOGICAL VOICE DATA BASE , 1998 .

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  Tim Ritchings,et al.  Pathological voice quality assesment using artificial neural networks , 2001, MAVEBA.

[9]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[10]  Ioannis Pitas,et al.  Automatic detection of vocal fold paralysis and edema , 2004, INTERSPEECH.

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  Philip de Chazal,et al.  Identification of voice pathology using automated speech analysis , 2003, MAVEBA.

[13]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[14]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[15]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[16]  Stefan Hadjitodorov,et al.  A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. , 2002, Medical engineering & physics.

[17]  Shrikanth Narayanan,et al.  Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[18]  Richard B. Reilly,et al.  Voice Pathology Assessment Based on a Dialogue System and Speech Analysis , 2004, AAAI Technical Report.

[19]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.