Real-world speech recognition with neural networks

We describe a system based on neural networks that is designed to recognize speech transmitted through the telephone network. Context-dependent phonetic modeling is studied as a method of improving recognition accuracy, and a special training algorithm is introduced to make the training of these nets more manageable. Our system is designed for real-world applications, and we have therefore specialized our implementation for this goal; a pipelined DSP structure and a compact search algorithm are described as examples of this specialization. Preliminary results from a realistic test of the system (a field trial for the U.S. Census Bureau) are reported.

[1]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2]  Hervé Bourlard,et al.  Connectionist speech recognition , 1993 .

[3]  S E Levinson Speech recognition technology: a critique. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[4]  John Makhoul,et al.  Comparative experiments on large vocabulary speech recognition , 1993 .

[5]  Ronald A. Cole,et al.  Pitch detection with a neural-net classifier , 1991, IEEE Trans. Signal Process..

[6]  L.F.A. Wessels,et al.  Extrapolation and interpolation in neural network classifiers , 1992, IEEE Control Systems.

[7]  Etienne Barnard,et al.  A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[8]  Hervé Bourlard,et al.  A new approach towards keyword spotting , 1993, EUROSPEECH.

[9]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[10]  Mei-Yuh Hwang,et al.  An improved search algorithm using incremental knowledge for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Horacio Franco,et al.  Context-Dependent Multiple Distribution Phonetic Modeling with MLPs , 1992, NIPS.

[12]  Dick R. van Bergem,et al.  A model of coarticulatory effects on the schwa , 1994, Speech Commun..

[13]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.