Modèles a posteriori de la forme et de l'apparence des lèvres pour la reconnaissance automatique de la parole audiovisuelle
暂无分享,去创建一个
In this manuscript, we present our research on model-based parameters extraction from video sequences for automatic speechreading in natural weaklv constrained, conditions. More precisely we describe the a posteriori lip shape and appearance models learnt from corpora that we propose. To be trained, these models require that lips can be located easily on images, which is not the case on nutural images. As manually labelling images is time-consuming, and hardly possible on a large corpus, we propose to use automatic methods instead through the use of make up and speech's bimodality. First, we defined a shape model for the lips containing two polygons : one for the outer lip contour and the other for the inner lip contour. This rnodel gives the opportunity to extract most lipreading information according to a in depth bibliographical study. To train statistically this model, we use video sequences where the speakers wear bIue lipstick on their lips, which enables easy boundary extraction. Welearn the mean shape and the main deformations. Next, we studied statistical appearance models which can only be trained on natural images. On these images, automatic lip location without external constraints is still unsolved. To label lips automatically, we use two repetitions of the same sentence by the same subject, with and without blue make up : onceagain, the blue sequence enables easy lip location and dynamic time warping (dtw) allows to estimate lip shape on natural images using the extracted shapes on blue images. The appearance model obtained is very similar to the one obtained when training the same initial model with hand-Iabeled images and is quite better than other models relying on hue. Moreover, the model we built can be adapted to any subject.