Facial Feature Extraction using Deformable Graphs and Statistical Pattern Matching

In model-based coding of image sequences containing human faces, e.g., videophone sequences, the detection and location of the face as well as the extraction of facial features from the images are crucial. The facial feature extraction can be regarded as a optimization problem, searching the optimum adaptation parameters of the model. The optimum is defined as the minimum distance between the extracted face and a face space. There are different approaches to reduce the computational complexity, and here a scheme using deformable graphs and dynamic programming is described. Experiments have been performed with promising results. I. MODEL-BASED CODING Since the major application of the techniques described in this document is model-based coding, an introduction to that topic will follow here. For more details, see [2, 9, 10, 14]. The basic idea of model-based coding of video sequences is illustrated in Fig. 1. At the encoding side of a visual communication system (typically, a videophone system), the image from the camera is analysed, using computer vision techniques, and the relevant object(s), for example a human face, is identified. A general or specific model is then adapted to the object, usually the model is a wireframe describing the 3-D shape of the object. Instead of transmitting the full image pixel-by-pixel, or by coefficients describing the waveform of the image, the image is handled as a 2-D projection of 3D objects in a scene. To achieve this, parameters describing the object(s) are extracted, coded and transmitted. Typical parameters are size, position and shape. To achieve acceptable visual similarity to the original image, the texture of the object is also transmitted. The texture can be compressed by some traditional image coding technique, but specialized techniques lowering the bit-rate considerably for certain applications have recently been published [15, 16]. At the receiver side of the system, the parameters are decoded and the decoder’s model is modified accordingly. The model is then synthesized as a visual object using computer graphics techniques, e.g., a wireframe is shaped according to the shape and size parameters and the texture is mapped onto its surfaces. In the following images, parameters describing the change of the model are transmitted. Typically, those parameters tell how to rotate and translate the model, and, in case of a non-rigid object like a human face, parameters describing motion of individual vertices of the wireframe are transmitted. This constitutes the largest gain of the model-based coding, since the motion parameters can be transmitted at very low bitrates [1]. Definitions for coding and representation of parameters for model-based coding and animation of human faces are included in the newly set international standard MPEG-4 [12, 13]. Components of a Model-Based Coding System To encode an image sequence in a model-based scheme, we need first to detect and locate the face. This can be done by, e.g., colour discrimination, detection of elliptical objects using Hough-transforms, connectionist/neural network methods, or statistical pattern matching. E nco der D eco der Channel M odel M odel O riginal

[1]  Niclas Wiberg,et al.  Codes and Decoding on General Graphs , 1996 .

[2]  Haibo Li,et al.  Image sequence coding at very low bit rates: a review , 1994, IEEE Trans. Image Process..

[3]  Ian Craw,et al.  Automatic extraction of face-features , 1987, Pattern Recognit. Lett..

[4]  Bernd Girod,et al.  Motion Analysis and Image Sequence Processing , 1993 .

[5]  Anastasios Tefas,et al.  Multi modal verification for teleservices and security applications (M2VTS) , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[6]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  J. Ahlberg,et al.  Representing and Compressing MPEG-4 Facial Animation Parameters using Facial Action Basis Functions , 1998 .

[8]  T. F. CootesDecember Vision through Optimization , 1996 .

[9]  John Daugman,et al.  High Confidence Visual Recognition of Persons by a Test of Statistical Independence , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Haibo Li Facial Texture Compression for Model-based Coding , 1998 .

[11]  Margrit Betke,et al.  Fast object recognition in noisy images using simulated annealing , 1995, Proceedings of IEEE International Conference on Computer Vision.

[12]  Patrick Pérez,et al.  Generalized Likelihood Ratio-based Face Detection and Extraction of Mouth Features , 1997, AVBPA.

[13]  Haibo Li,et al.  Very Low Bit Rate Facial Texture Coding , 1997 .

[14]  C. S. Choi,et al.  Human Facial Motion Analysis and Synthesis with Application to Model-Based Coding , 1993 .