Audio-visual graphical models for speech processing

Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements cross-model self-supervised learning, enabling adaptation to audio-visual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in the video. The system can learn to detect and enhance speech in noise given only a short (30 second) sequence of audio-visual data. We show some results for speech detection and enhancement, and discuss extensions to the model that are under investigation.

[1]  T. Kristjansson,et al.  High resolution signal reconstruction , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Li Deng,et al.  Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[3]  Brendan J. Frey,et al.  Learning Dynamic Noise Models from Noisy Speech for Robust Speech Recognition , 2001 .

[4]  Javier R. Movellan,et al.  Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.

[5]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Nebojsa Jojic,et al.  Audio-Video Sensor Fusion with Probabilistic Graphical Models , 2002, ECCV.

[7]  J. Luettin,et al.  Audio-visual Speech Recognition Workshop 2000 Final Report , 2000 .

[8]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).