In this paper we present an alternative to hidden Markov models for the recognition of image sequences. The approach is based on a stochastic version of recurrent neural networks, which we call diffusion networks. Contrary to hidden Markov models, diffusion networks operate with continuous state dynamics, and generate continuous paths. This aspect that may be beneficial in computer vision tasks in which continuity is a useful constraint. In this paper we review results required for the implementation of diffusion networks, and then apply them to a visual speech recognition task. Diffusion networks outperformed the results obtained with the best hidden Markov models. Introduction We present a novel way to recognize visual speech sequences using an extension of recurrent neural networks in which the dynamics are probabilistic [11, 10]. Instead of a set of ordinary differential equations (ODEs), diffusion networks are described by a set of stochastic differential equations (SDEs). SDEs provide a rich language for expressing stochastic temporal dynamics and have proven useful in formulating continuous–time statistical inference problems, resulting in such solutions as the continuous Kalman filter and generalizations of it like the condensation algorithm [3]. The paper relies on aspects of measure theory and probability theory of continuous time processes that may be unfamiliar to some readers. For a concise review of these concepts the reader is referred to [5, Ch. 2]. Review of diffusion networks A diffusion network is a set of coupled nodes whose dynamics are given the Ito SDE dX(t) = (t; X (t); )dt + dB(t); (1)
[1]
Juergen Luettin,et al.
Visual Speech and Speaker Recognition
,
1997
.
[2]
Juergen Luettin,et al.
Speechreading using shape and intensity information
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[3]
P. Kloeden,et al.
Numerical Solutions of Stochastic Differential Equations
,
1995
.
[4]
G. Milstein.
Numerical Integration of Stochastic Differential Equations
,
1994
.
[5]
E. Helfand.
Numerical integration of stochastic differential equations
,
1979,
The Bell System Technical Journal.
[6]
Javier R. Movellan,et al.
Dynamic Features for Visual Speechreading: A Systematic Comparison
,
1996,
NIPS.
[7]
B. Efron.
The jackknife, the bootstrap, and other resampling plans
,
1987
.
[8]
Michael Isard,et al.
Contour Tracking by Stochastic Propagation of Conditional Density
,
1996,
ECCV.
[9]
Werner Römisch,et al.
Numerical Solution of Stochastic Differential Equations (Peter E. Kloeden and Eckhard Platen)
,
1995,
SIAM Rev..
[10]
Juergen Luettin,et al.
Statistical LIP modelling for visual speech recognition
,
1996,
1996 8th European Signal Processing Conference (EUSIPCO 1996).
[11]
Paul Mineiro,et al.
Learning Path Distributions Using Nonequilibrium Diffusion Networks
,
1997,
NIPS.