论文信息 - Real-world speech recognition with neural networks

Real-world speech recognition with neural networks

We describe a system based on neural networks that is designed to recognize speech transmitted through the telephone network. Context-dependent phonetic modeling is studied as a method of improving recognition accuracy, and a special training algorithm is introduced to make the training of these nets more manageable. Our system is designed for real-world applications, and we have therefore specialized our implementation for this goal; a pipelined DSP structure and a compact search algorithm are described as examples of this specialization. Preliminary results from a realistic test of the system (a field trial for the U.S. Census Bureau) are reported.

Ronald A. Cole | Mark A. Fanty | Etienne Barnard | Pieter Vermeulen

[1] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2] Hervé Bourlard,et al. Connectionist speech recognition , 1993 .

[3] S E Levinson. Speech recognition technology: a critique. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[4] John Makhoul,et al. Comparative experiments on large vocabulary speech recognition , 1993 .

[5] Ronald A. Cole,et al. Pitch detection with a neural-net classifier , 1991, IEEE Trans. Signal Process..

[6] L.F.A. Wessels,et al. Extrapolation and interpolation in neural network classifiers , 1992, IEEE Control Systems.

[7] Etienne Barnard,et al. A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[8] Hervé Bourlard,et al. A new approach towards keyword spotting , 1993, EUROSPEECH.

[9] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .

[10] Mei-Yuh Hwang,et al. An improved search algorithm using incremental knowledge for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Horacio Franco,et al. Context-Dependent Multiple Distribution Phonetic Modeling with MLPs , 1992, NIPS.

[12] Dick R. van Bergem,et al. A model of coarticulatory effects on the schwa , 1994, Speech Commun..

[13] Etienne Barnard,et al. Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.