Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors

It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders-including glottic cancer-under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.

[1]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[2]  S. Feijóo,et al.  Short-term stability measures for the evaluation of vocal quality. , 1990, Journal of speech and hearing research.

[3]  L. Siegel,et al.  Voiced/Unvoiced/Mixed excitation classification of speech , 1982 .

[4]  D. Childers,et al.  Detection of laryngeal function using speech and electroglottographic data , 1992, IEEE Transactions on Biomedical Engineering.

[5]  E. Yumoto,et al.  Harmonics-to-noise ratio and psychophysical measurement of the degree of hoarseness. , 1984, Journal of speech and hearing research.

[6]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[7]  J.H.L. Hansen,et al.  A noninvasive technique for detecting hypernasal speech using a nonlinear operator , 1996, IEEE Transactions on Biomedical Engineering.

[8]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[9]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[10]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[11]  W S Winholtz,et al.  Vocal tremor analysis with the Vocal Demodulator. , 1992, Journal of speech and hearing research.

[12]  Hamid Bolouri Book Review: Pattern Recognition: statistical, structural and neural approaches , 1992 .

[13]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[14]  Stefan Hadjitodorov,et al.  ACOUSTIC ANALYSIS OF PATHOLOGICAL VOICES , 1997 .

[15]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[16]  Claudia Manfredi,et al.  Adaptive noise energy estimation in pathological speech signals , 2000, IEEE Transactions on Biomedical Engineering.

[17]  Dimitar D. Deliyski,et al.  Acoustic model and evaluation of pathological voice production , 1993, EUROSPEECH.

[18]  Stefan Todorov Hadjitodorov,et al.  Laryngeal pathology detection by means of class-specific neural maps , 2000, IEEE Transactions on Information Technology in Biomedicine.

[19]  R. T. Ritchings,et al.  A Neural Network Based Approach to Objective Voice Quality Assessment , 1999 .

[20]  Tim Ritchings,et al.  Pathological voice quality assesment using artificial neural networks , 2001, MAVEBA.

[21]  T. Kohonen,et al.  Appendix 2.4 Stopping Rule 2.3 Fine Tuning Using the Basic Lvq1 or Lvq2.1 Lvq Pak: a Program Package for the Correct Application of Learning Vector Quantization Algorithms , 1992 .

[22]  L. Gavidia-Ceballos,et al.  Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection , 1996, IEEE Transactions on Biomedical Engineering.

[23]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[24]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[25]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[26]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.