Lip motion automatic detection

An algorithm for speaker's lip motion detection is presented, based on the processing of a colour video sequence of speaker's face under natural lighting conditions and without any particular make-up. It is intended for applications in speech recognition, videoconferencing or speaker's face synthesis and animation. The algorithm is based on a statistical approach using Markov Random Field (MRF) modelling, with a spatiotemporal neighbourhood of the pixels in the image sequence. Two kinds of observations are used : the temporal difference between successive images (motion information) and the purity of red hue in the current and past images (spatial information about lip location). The field of hidden labels, relevant for lip motion detection, is obtained by energy minimisation and proves to be robust to lighting conditions (shadows). This label field is used to extract qualitative information (mouth opening and closing) but also quantitative information by measuring some geometrical features (horizontal and vertical lip spacing) directly on the label field.