A comparative study of signal representations and classification techniques for speech recognition

The authors investigate the interactions of two important sets of techniques in speech recognition: signal representation and classification. In addition, in order to quantify the effect of the telephone network, experiments are performed on both wideband and telephone-quality speech. The spectral and cepstral signal processing techniques studied fall into a few major categories based on Fourier analyses, linear prediction, and auditory processing. The classification techniques examined are Gaussian, mixture Gaussians, and the multilayer perceptron (MLP). Results indicate that the MLP consistently produces lower error rates than the other two classifiers. When averaged across all three classifiers, the Bark auditory spectral coefficients (BASC) produce the lowest phonetic classification error rates. When evaluated in a stochastic segment framework using the MLP, BASC also produces the lowest word error rate.<<ETX>>

[1]  Victor Zue,et al.  Speech recognition using stochastic explicit-segment modeling , 1991, EUROSPEECH.

[2]  Hynek Hermansky,et al.  OPTIMIZATION OF PERCEPTUALLY-BASED ASR FRONT , 1988 .

[3]  Hong C. Leung,et al.  Speech recognition using stochastic segment neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Judith Spitz Collection and Analysis of Data from Real Users: Implications for Speech Recognition/Understanding Systems , 1991, HLT.

[5]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Helen Meng,et al.  Signal representation comparison for phonetic classification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Chung Leung Hong The use of artificial neural networks for phonetic recognition , 1989 .

[12]  H. Hermansky,et al.  Optimization of perceptually-based ASR front-end (automatic speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Benjamin Chigier,et al.  Phonetic Classification on Wide-Band and Telephone Quality Speech , 1992, HLT.

[14]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15]  Frank K. Soong,et al.  High performance connected digit recognition, using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Victor W. Zue,et al.  Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Charles Robert Jankowski,et al.  A comparison of auditory models for automatic speech recognition , 1992 .