A diffusion network approach to visual speech recognition

In this paper we present an alternative to hidden Markov models for the recognition of image sequences. The approach is based on a stochastic version of recurrent neural networks, which we call diffusion networks. Contrary to hidden Markov models, diffusion networks operate with continuous state dynamics, and generate continuous paths. This aspect that may be beneficial in computer vision tasks in which continuity is a useful constraint. In this paper we review results required for the implementation of diffusion networks, and then apply them to a visual speech recognition task. Diffusion networks outperformed the results obtained with the best hidden Markov models. Introduction We present a novel way to recognize visual speech sequences using an extension of recurrent neural networks in which the dynamics are probabilistic [11, 10]. Instead of a set of ordinary differential equations (ODEs), diffusion networks are described by a set of stochastic differential equations (SDEs). SDEs provide a rich language for expressing stochastic temporal dynamics and have proven useful in formulating continuous–time statistical inference problems, resulting in such solutions as the continuous Kalman filter and generalizations of it like the condensation algorithm [3]. The paper relies on aspects of measure theory and probability theory of continuous time processes that may be unfamiliar to some readers. For a concise review of these concepts the reader is referred to [5, Ch. 2]. Review of diffusion networks A diffusion network is a set of coupled nodes whose dynamics are given the Ito SDE dX(t) = (t; X (t); )dt + dB(t); (1)