Video representation with three-dimensional entities

Very low bit-rate coding requires new paradigms that go well beyond pixel- and frame-based video representations. We introduce a novel content-based video representation using tridimensional entities: textured object models and pose estimates. The multiproperty object models carry stochastic information about the shape and texture of each object present in the scene. The pose estimates define the position and orientation of the objects for each frame. This representation is compact. It provides alternative means for handling video by manipulating and compositing three-dimensional (3-D) entities. We call this representation tridimensional video compositing, or 3DVC for short. We present the 3DVC framework and describe the methods used to construct incrementally the object models and the pose estimates from unregistered noisy depth and texture measurements. We also describe a method for video frame reconstruction based on 3-D scene assembly, and discuss potential applications of 3DVC to video coding and content-based handling. 3DVC assumes that the objects in the scene are rigid and segmented. By assuming segmentation, we do not address the difficult questions of nonrigid segmentation and multiple object segmentation. In our experiments, segmentation is obtained via depth thresholding. It is important to notice that 3DVC is independent of the segmentation technique adopted. Experimental results with synthetic and real video sequences where compression ratios in the range of 1:150-1:2700 are achieved demonstrate the applicability of the proposed representation to very low bit-rate coding.

[1]  Baba C. Vemuri,et al.  3-D MODEL CONSTRUCTION FROM MULTIPLE VIEWS USING RANGE AND INTENSITY DATA. , 1986 .

[2]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[3]  C. S. Choi,et al.  Human Facial Motion Analysis and Synthesis with Application to Model-Based Coding , 1993 .

[4]  Bir Bhanu,et al.  Representation and Shape Matching of 3-D Objects , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  John Wang,et al.  Applying mid-level vision techniques for video data compression and manipulation , 1994, Electronic Imaging.

[6]  Jake K. Aggarwal,et al.  Estimation of motion from a pair of range images: A review , 1991, CVGIP Image Underst..

[7]  Reinhard Koch,et al.  3-D surface reconstruction from stereoscopic image sequences , 1995, Proceedings of IEEE International Conference on Computer Vision.

[8]  V. Bove,et al.  Semiautomatic 3D-model extraction from uncalibrated 2D-camera views , 1995 .

[9]  Ruzena Bajcsy,et al.  Recovery of Parametric Models from Range Images: The Case for Superquadrics with Global Deformations , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Radu S. Jasinschi,et al.  Retrieving quality video across heterogeneous networks. Video over wireless , 1996, IEEE Wirel. Commun..

[11]  Tim J. Dennis,et al.  Three-dimensional parameter estimation from stereo image sequences for model-based image coding , 1995, Signal Process. Image Commun..

[12]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[13]  Kiyoharu Aizawa,et al.  Model-based analysis synthesis image coding (MBASIC) system for a person's face , 1989, Signal Process. Image Commun..

[14]  Masahide Kaneko,et al.  Coding of facial image sequence based on a 3-D model of the head and motion detection , 1991, J. Vis. Commun. Image Represent..

[15]  Tony DeRose,et al.  Surface reconstruction from unorganized points , 1992, SIGGRAPH.

[16]  P. Anandan,et al.  Video as an image data source: efficient representations and applications , 1995, Proceedings., International Conference on Image Processing.

[17]  Jose M. F. Moura,et al.  3D-3D registration of free-formed objects using shape and texture , 1997, Electronic Imaging.

[18]  V. Michael Bove,et al.  Semiautomatic 3-D model extraction from uncalibrated 2-D camera views , 1995 .

[19]  Takeo Kanade,et al.  Real-time 3-D pose estimation using a high-speed range sensor , 1993, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[20]  José M. F. Moura,et al.  3-D video compositing: towards a compact representation for video sequences , 1995, Proceedings., International Conference on Image Processing.

[21]  V. Bove Probabilistic method for integrating multiple sources of range data , 1990 .

[22]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Takeo Kanade,et al.  Development of a Video-Rate Stereo Machine , 1997 .

[24]  Lisa M. Brown,et al.  A survey of image registration techniques , 1992, CSUR.

[25]  V. Michael Bove,et al.  Object-Oriented Television , 1995 .

[26]  Arie E. Kaufman Volume visualization , 1996, CSUR.

[27]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[28]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[29]  Alex Pentland,et al.  Recursive estimation for CAD model recovery , 1994, Proceedings of 1994 IEEE 2nd CAD-Based Vision Workshop.

[30]  Katsushi Ikeuchi,et al.  Reflectance Analysis for 3D Computer Graphics Model Generation , 1996, CVGIP Graph. Model. Image Process..

[31]  Shree K. Nayar,et al.  Real-time computation of depth from defocus , 1996, Other Conferences.

[32]  Steve Hollasch,et al.  Advanced animation and rendering techniques , 1994, Comput. Graph..

[33]  Nicholas Ayache,et al.  3D-2D Projective Registration of Free-Form Curves and Surfaces , 1997, Comput. Vis. Image Underst..

[34]  Craig E. Kolb Rayshade user''s guide and reference manual , 1994 .

[35]  Reinhard Koch,et al.  Automatic Reconstruction of Buildings from Stereoscopic Image Sequences , 1993, Comput. Graph. Forum.

[36]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.

[37]  José M. F. Moura,et al.  Video compression via constructs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Jörn Ostermann,et al.  Object-oriented analysis-synthesis coding of moving images , 1989, Signal Process. Image Commun..

[39]  Denis Laurendeau,et al.  A General Surface Approach to the Integration of a Set of Range Views , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Haibo Li,et al.  Image sequence coding at very low bit rates: a review , 1994, IEEE Trans. Image Process..

[41]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[42]  M. Hebert,et al.  The Representation, Recognition, and Locating of 3-D Objects , 1986 .

[43]  M. Levoy,et al.  Fast volume rendering using a shear-warp factorization of the viewing transformation , 1994, SIGGRAPH.

[44]  G. Champleboux,et al.  From accurate range imaging sensor calibration to accurate model-based 3D object localization , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Jörn Ostermann,et al.  Object-based analysis-synthesis coding based on the source model of moving rigid 3D objects , 1994, Signal Process. Image Commun..

[46]  Narendra Ahuja,et al.  Generating Octrees from Object Silhouettes in Orthographic Views , 1989, IEEE Trans. Pattern Anal. Mach. Intell..