Model-aided coding: a new approach to incorporate facial animation into motion-compensated video coding

We show that traditional waveform coding and 3-D model-based coding are not competing alternatives, but should be combined to support and complement each other. Both approaches are combined such that the generality of waveform coding and the efficiency of 3-D model-based coding are available where needed. The combination is achieved by providing the block-based video coder with a second reference frame for prediction, which is synthesized by the model-based coder. The model-based coder uses a parameterized 3-D head model, specifying the shape and color of a person. We therefore restrict our investigations to typical videotelephony scenarios that show head-and-shoulder scenes. Motion and deformation of the 3-D head model constitute facial expressions which are represented by facial animation parameters (FAPs) based on the MPEG-4 standard. An intensity gradient-based approach that exploits the 3-D model information is used to estimate the FAPs, as well as illumination parameters, that describe changes of the brightness in the scene. Model failures and objects that are not known at the decoder are handled by standard block-based motion-compensated prediction, which is not restricted to a special scene content, but results in lower coding efficiency. A Lagrangian approach is employed to determine the most efficient prediction for each block from either the synthesized model frame or the previous decoded frame. Experiments on five video sequences show that bit rate savings of about 35% are achieved at equal average peak signal-to-noise ratio (PSNR) when comparing the model-aided codec to TMN-10, the state-of-the-art test model of the M.263 standard. This corresponds to a gain of 2-3 dB in PSNR when encoding at the same average bit rate.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Thomas Wiegand,et al.  Long-term memory motion-compensated prediction , 1999, IEEE Trans. Circuits Syst. Video Technol..

[3]  Andy C. Downton,et al.  A switched model-based coder for video signals , 1994, IEEE Trans. Circuits Syst. Video Technol..

[4]  Aggelos K. Katsaggelos,et al.  Fast and efficient mode and quantizer selection in the rate distortion sense for H.263 , 1996, Other Conferences.

[5]  Harvey J. Everett Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources , 1963 .

[6]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[7]  Gary J. Sullivan,et al.  Motion compensation for video compression using control grid interpolation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thomas Ertl,et al.  Computer Graphics - Principles and Practice, 3rd Edition , 2014 .

[9]  M. Carter Computer graphics: Principles and practice , 1997 .

[10]  Peter Eisert,et al.  Analyzing Facial Expressions for Virtual Conferencing , 1998, IEEE Computer Graphics and Applications.

[11]  Thomas Vetter,et al.  Estimating Coloured 3D Face Models from Single Images: An Example Based Approach , 1998, ECCV.

[12]  Jörn Ostermann Object-based analysis-synthesis coding (OBASC) based on the source model of moving flexible 3-D objects , 1994, IEEE Trans. Image Process..

[13]  Bernd Girod,et al.  Image sequence coding using 3D scene models , 1994, Other Conferences.

[14]  Jürgen Stauder,et al.  Estimation of point light source parameters for object-based coding , 1995, Signal Process. Image Commun..

[15]  Jörn Ostermann,et al.  Automatic adaptation of a face model in a layered coder with an object-based analysis-synthesis layer and a knowledge-based layer , 1997, Signal Process. Image Commun..

[16]  Charles A. Poynton,et al.  Gamma and Its Disguises : The Nonlinear Mappings of Intensity in Perception, CRTs, Film, and Video , 1993 .

[17]  W. J. Welsh Model-based coding of videophone images , 1991 .

[18]  Yair Shoham,et al.  Efficient bit allocation for an arbitrary set of quantizers [speech coding] , 1988, IEEE Trans. Acoust. Speech Signal Process..

[19]  Itu-T Video coding for low bitrate communication , 1996 .

[20]  A. Murat Tekalp,et al.  3-D motion estimation and wireframe adaptation including photometric effects for model-based coding of facial image sequences , 1994, IEEE Trans. Circuits Syst. Video Technol..

[21]  Thomas Wiegand,et al.  Using multiple global motion models for improved block-based video coding , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[22]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[24]  Peter Eisert,et al.  Model-based Coding of Facial Image Sequences at Varying Illumination Conditions , 1998 .

[25]  Sanjit K. Mitra,et al.  Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard , 1996, IEEE Trans. Circuits Syst. Video Technol..

[26]  大野 義夫,et al.  Computer Graphics : Principles and Practice, 2nd edition, J.D. Foley, A.van Dam, S.K. Feiner, J.F. Hughes, Addison-Wesley, 1990 , 1991 .

[27]  Bernd Girod,et al.  Rate-constrained motion estimation , 1994, Other Conferences.

[28]  Dimitris N. Metaxas,et al.  The integration of optical flow and deformable models with applications to human face shape and motion estimation , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Peter Eisert,et al.  Model-based estimation of facial expression parameters from image sequences , 1997, Proceedings of International Conference on Image Processing.

[30]  D. E. Pearson,et al.  Developments in model-based video coding , 1995, Proc. IEEE.

[31]  Hans Georg Musmann A layered coding system for very low bit rate video coding , 1995, Signal Process. Image Commun..

[32]  Sanjit K. Mitra,et al.  Efficient mode selection for block-based motion compensated video coding , 1995, Proceedings., International Conference on Image Processing.

[33]  Steven K. Feiner,et al.  Computer graphics: principles and practice (2nd ed.) , 1990 .

[34]  Thomas Wiegand,et al.  Multiple-reference-picture video coding using polynomial motion models , 1998, Electronic Imaging.

[35]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[36]  Kiyoharu Aizawa,et al.  Model-based image coding , 1994, Other Conferences.

[37]  Touradj Ebrahimi,et al.  Dynamic video coding-an overview , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[38]  R. Ladner Entropy-constrained Vector Quantization , 2000 .

[39]  Bernd Girod,et al.  Fully Embedded Coding of Triangle Meshes , 1999 .

[40]  Gary J. Sullivan,et al.  Rate-distortion optimization for video compression , 1998, IEEE Signal Process. Mag..