Motion-Adaptive Modelling of Scene Content for Very Low Bit Rate Model-Assisted Coding of Video

A method and apparatus for video coding whereby a region of an image which includes a predetermined object such as a person's face in the foreground portion of the image is automatically determined. Specifically, the foreground portion of the image is identified, and one or more predetermined (geometric) shapes (e.g., ellipses) are compared with the shapes of objects found in the foreground portion of the image. The foreground portion of an image may be determined by performing a global motion estimation of the overall image to detect global image movement resulting, for example, from camera pan and zoom. That portion of the image whose movement is consistent with the estimated global motion may be identified as the background portion, with the remainder of the image identified as the foreground portion. The identified region of the image which includes the predetermined object and portions of the image which do not include the predetermined object may be coded with differing levels of coding accuracy (e.g., using different quantization levels), such that if the identified region contains, for example, a person's face, the quality of the coding of the face may be improved relative to the quality of the coding of other portions of the image.

[1]  Venu Govindaraju,et al.  Locating human faces in newspaper photographs , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Arnaud E. Jacquin,et al.  Very low bit rate 3D subband-based video coding with a dynamic bit allocation , 1993, Other Conferences.

[3]  Ming Lei Liou,et al.  Overview of the p×64 kbit/s video coding standard , 1991, CACM.

[4]  Tsuhan Chen,et al.  Defocus-based image segmentation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Arnaud Jacquin,et al.  Geometric Vector Quantization for Subband-Based Video Coding , 1992, Coding And Quantization.

[6]  Paulo Nunes,et al.  Mobile videotelephone communications: the CCITT H.261 chances , 1993, Other Conferences.

[7]  Theodosios Pavlidis,et al.  Structural pattern recognition , 1977 .

[8]  Alexandros Eleftheriadis,et al.  Model-assisted coding of video teleconferencing sequences at low bit rates , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[9]  H. Harashima,et al.  Analysis and synthesis of facial expressions in knowledge-based coding of facial image sequences , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Alexandros Eleftheriadis,et al.  Automatic face location detection for model-assisted rate control in H.261-compatible coding of video , 1995, Signal Process. Image Commun..

[11]  David Malah,et al.  Global-motion estimation in image sequences of 3-D scenes for coding applications , 1995, Signal Process. Image Commun..

[12]  Hsueh-Ming Hang,et al.  Image and video coding standards , 1993, AT&T Technical Journal.

[13]  Dirk Adolph,et al.  1.15 Mbit/s coding of video signals including global motion compensation , 1991, Signal Process. Image Commun..

[14]  Kiyoharu Aizawa,et al.  Model-based analysis synthesis image coding (MBASIC) system for a person's face , 1989, Signal Process. Image Commun..

[15]  Nobuyuki Yagi,et al.  Estimation of camera parameters from image sequence for model-based video coding , 1994, IEEE Trans. Circuits Syst. Video Technol..

[16]  Harry E. Blanchard,et al.  Video telephony , 1993, AT&T Technical Journal.

[17]  Wolfgang Guse,et al.  Effective exploitation of background memory for coding of moving video using object mask generation , 1990, Other Conferences.

[18]  Murat Kunt,et al.  A new two-stage global/local motion estimation based on a background/foreground segmentation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Norbert Diehl,et al.  Object-oriented motion estimation and segmentation in image sequences , 1991, Signal Process. Image Commun..

[20]  Venu Govindaraju,et al.  A computational model for face location , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[21]  Robert Forchheimer,et al.  Image coding-from waveforms in animation , 1989, IEEE Trans. Acoust. Speech Signal Process..

[22]  L. Masera,et al.  Foreground/background segmentation in videotelephony , 1989, Signal Process. Image Commun..

[23]  D. E. Pearson,et al.  Developments in model-based video coding , 1995, Proc. IEEE.

[24]  Hiroshi Harashima,et al.  Model-based/waveform hybrid coding for videotelephone images , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[26]  Ian Craw,et al.  Automatic extraction of face-features , 1987, Pattern Recognit. Lett..

[27]  Alexandros Eleftheriadis,et al.  Automatic face location detection and tracking for model-assisted coding of video teleconferencing sequences at low bit-rates , 1995, Signal Process. Image Commun..

[28]  Hiroyuki Okada,et al.  Object-oriented H.263 compatible video coding platform for conferencing applications , 1998, IEEE J. Sel. Areas Commun..

[29]  Jörn Ostermann,et al.  Object-oriented analysis-synthesis coding of moving images , 1989, Signal Process. Image Commun..

[30]  M. Hötter,et al.  Image segmentation based on object oriented mapping parameter estimation , 1988 .

[31]  R. L. Baker,et al.  Global zoom/pan estimation and compensation for video compression , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[32]  M. Kunt,et al.  Second-generation image-coding techniques , 1985, Proceedings of the IEEE.