A subspace approach to layer extraction, patch-based SFM, and video compression

Representing videos with layers has important applications such as video compression, motion analysis, 3D modeling and rendering. This thesis proposes a subspace approach to extracting layers from video by taking advantages of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace. In the subspace, layers in the input images are mapped onto well-defined clusters, and can be reliably identified by a standard clustering algorithm (e.g., mean-shift). Global optimality is achieved since both spatial and temporal redundancy are simultaneously taken into account, and noise can be effectively reduced by enforcing the subspace constraint. The existence of subspace also enables outlier detection, making the subspace computation robust. Based on the subspace constraint, we propose a patch-based scheme for affine structure from motion (SFM), which recovers the plane equation of each planar patch in the scene, as well as the camera epipolar geometry. We propose two approaches to patch-based SFM: (1) factorization approach; and (2) layer based approach. Patch-based SFM provides a compact video representation that can be used to construct a high quality texture map for each layer. We plan to apply our approach to generating Video Object Planes (VOPs) defined by MPEG4 standard. VOP generation is a critical but unspecified step in MPEG-4 standard. Our motion model for each VOP consists of a global planar motion and localized deformations, which has a closed-form solution. Our goals are: (1) combining different low level cues to model VOPs; and (2) extracting VOPs that undergo more complicated motion (non-planar or non-rigid).

[1]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  David J. Fleet,et al.  Design and Use of Linear Models for Image Motion Analysis , 2000, International Journal of Computer Vision.

[3]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[4]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[5]  Michal Irani,et al.  Multi-frame optical flow estimation using subspace constraints , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Ping Wah Wong,et al.  Edge-directed interpolation , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[7]  Amnon Shashua,et al.  The Rank 4 Constraint in Multiple (>=3) View Geometry , 1996, ECCV.

[8]  Richard Szeliski,et al.  Motion Estimation with Quadtree Splines , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  M. Tistarelli,et al.  Analysis of Image Sequences , 1985 .

[10]  Takeo Kanade,et al.  A subspace approach to layer extraction , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Thomas Bräunl,et al.  Analysis of Image Sequences , 2001 .

[12]  Alan L. Yuille,et al.  Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Patrick Bouthemy,et al.  A region-level graph labeling approach to motion-based segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Richard Szeliski,et al.  A layered video object coding system using sprite and affine motion model , 1997, IEEE Trans. Circuits Syst. Video Technol..

[15]  P. Anandan,et al.  Direct Recovery of Planar-Parallax from Multiple Frames , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[17]  Ming-Chieh Lee,et al.  Semiautomatic segmentation and tracking of semantic video objects , 1998, IEEE Trans. Circuits Syst. Video Technol..

[18]  Christopher G. Harris,et al.  Structure-from-motion under orthographic projection , 1990, Image Vis. Comput..

[19]  Dorin Comaniciu,et al.  Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[21]  Andrew Zisserman,et al.  Geometric invariance in computer vision , 1992 .

[22]  Harry Shum,et al.  Optimal texture map reconstruction from multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Richard Szeliski,et al.  A layered approach to stereo reconstruction , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[24]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[25]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[26]  I. T. Jolliffe,et al.  Generalizations and Adaptations of Principal Component Analysis , 1986 .

[27]  G. Dunteman Principal Components Analysis , 1989 .

[28]  Edward H. Adelson,et al.  A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Richard Szeliski,et al.  An integrated Bayesian approach to layer extraction from image sequences , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Christopher G. Harris Structure-from-motion under orthographic projection , 1991, Image Vis. Comput..

[31]  Richard Szeliski,et al.  Geometrically Constrained Structure from Motion: Points on Planes , 1998, SMILE.

[32]  Lihi Zelnik-Manor,et al.  Multi-view subspace constraints on homographies , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[33]  Hai Tao,et al.  Global matching criterion and color segmentation based stereo , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[34]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Edward H. Adelson,et al.  Layered representation for motion analysis , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Larry S. Shapiro,et al.  Affine Analysis of Image Sequences: Contents , 1995 .

[37]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[40]  Michael J. Black,et al.  Estimating Optical Flow in Segmented Images Using Variable-Order Parametric Models With Local Deformations , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[42]  Lihi Zelnik-Manor,et al.  Multi-Frame Estimation of Planar Motion , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).