Audiovisual Sensory Integration Using Hidden Markov Models

An improved method of integrating audio and visual information in an audiovisual hidden Markov model based ASR system is investigated. The method uses an adaptive integration formula, which incorporates the integration into the HMM at a pre-categorical stage. A visual weighting parameter is determined automatically from HMM probability estimates; this allows the relative contribution of audio and visual information to be adjusted adaptively. Discrimination experiments were performed on a set of 22 consonants. The new method resulted in an error rate which was reduced by up to 25%, compared to a fixed parameter method. The greatest improvement occurred at low signal to noise ratios.