On the Integration of Auditory and Visual Parameters in an HMM-based ASR

In this paper, we describe two architectures for combining automatic speechreading and acoustic speech recognition. We propose a model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation. This is achieved by using a hybrid system based on two HMMs trained respectively with acoustic and optic data. Both architectures have been tested on degraded audio over a wide range of S/N ratios. The results of these experiments are presented and discussed.