论文信息 - Sensory integration in audiovisual automatic speech recognition

Sensory integration in audiovisual automatic speech recognition

Methods of integrating audio and visual information in an audiovisual HMM-based ASR system are investigated. Experiments involve discrimination of a set of 22 consonants, with various integration strategies. The role of the visual subsystem is varied; for example, in one run, the subsystem attempts to classify all 22 consonants, while in other runs it attempts only broader classifications. In a second experiment, a new HMM formulation is employed, which incorporates the integration into the HMM at a pre-categorical stage. A single variable parameter allows the relative contribution of audio and visual information to be controlled. This form of integration can be very easily incorporated into existing audio-based continuous speech recognizers.<<ETX>>

P. L. Silsbee

[1] Hynek Hermansky,et al. Evaluation and optimization of perceptually-based ASR front-end , 1993, IEEE Trans. Speech Audio Process..

[2] Alexander H. Waibel,et al. Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Alan C. Bovik,et al. Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[4] Francis Kubala,et al. New uses for the N-Best sentence hypotheses within the BYBLOS speech recognition system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Q. Summerfield. Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[6] Allen A. Montgomery,et al. Automatic optically-based recognition of speech , 1988, Pattern Recognit. Lett..

[7] Peter L. Silsbee. Motion in deformable templates , 1994, Proceedings of 1st International Conference on Image Processing.

[8] Hynek Hermansky,et al. RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Gregory J. Wolff,et al. Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.