论文信息 - An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same event. It is based on two other Markovian models, namely Asynchronous Input/Output Hidden Markov Models and Pair Hidden Markov Models. An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model has been tested on an audio-visual speech recognition task using the M2VTS database and yielded robust performances under various noise conditions.

Samy Bengio | Samy Bengio

[1] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .

[2] A. Nakamura,et al. Nature (London , 1975 .

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Q. Summerfield,et al. Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[5] Samy Bengio,et al. An EM Algorithm for Asynchronous Input/Output Hidden Markov Models , 1996 .

[6] Luc Vandendorpe,et al. The M2VTS Multimodal Face Database (Release 1.00) , 1997, AVBPA.

[7] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[8] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..