Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech

An algorithm is presented for automatically classifying speech into four categories: silent and speech produced by three types of excitation, namely, voiced, unvoiced, and mixed (a combination of voiced and unvoiced). The algorithm uses two-channel (speech and electroglottogram) signal analysis and has been tested on data from six speakers (three male and three female), each speaking five sentences. An overall correct classification accuracy of approximately 98.2% was achieved when compared to skilled manual classification. This is superior to previously reported automatic classification schemes. If word boundary errors, including the beginning and ending of sentences, are excluded, then the algorithm's performance improves to 99.5%. >

[1]  L. R. Rabiner,et al.  Evaluation of a statistical approach to voiced-unvoiced-silence analysis for telephone-quality speech , 1977, The Bell System Technical Journal.

[2]  F. Daaboul,et al.  Parametric segmentation of speech into voiced-unvoiced-silence intervals , 1977 .

[3]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[4]  L. Siegel,et al.  Voiced/Unvoiced/Mixed excitation classification of speech , 1982 .

[5]  Wolfgang Hess,et al.  Accurate time-domain pitch determination of speech signals by means of a laryngograph , 1987, Speech Commun..

[6]  J. N. Larar Lexical access using broad acoustic-phonetic classifications , 1986 .

[7]  D. Childers,et al.  A critical review of electroglottography. , 1985, Critical reviews in biomedical engineering.

[8]  A.P. Benguerel,et al.  Speech analysis , 1981, Proceedings of the IEEE.

[9]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  Wolfgang Hess,et al.  Accurate pitch determination of speech signals by means of a laryngograph , 1984, ICASSP.

[11]  Donald G. Childers,et al.  Electroglottography for Laryngeal Function Assessment and Speech Analysis , 1984, IEEE Transactions on Biomedical Engineering.

[12]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[13]  Lawrence R. Rabiner,et al.  Application of an LPC distance measure to the voiced-unvoiced-silence detection problem , 1977 .

[14]  Ieee Acoustics,et al.  IEEE Transactions on Acoustics, Speech, and Signal Processing , 1974 .

[15]  Victor Zue,et al.  Properties of large lexicons: Implications for advanced isolated word recognition systems , 1982, ICASSP.

[16]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[17]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[18]  Donald G. Childers,et al.  Formant speech synthesis: improving production quality , 1989, IEEE Trans. Acoust. Speech Signal Process..

[19]  G. P. Moore,et al.  A model for vocal fold vibratory motion, contact area, and the electroglottogram. , 1986, The Journal of the Acoustical Society of America.