Motion Segmentation by Spatiotemporal Smoothness Using 5D Tensor Voting

Our goal is to recover temporal trajectories of all pixels in a reference image for the given image sequence, and segment the image based on motion similarities. These trajectories can be visualized by observing the 3D (x, y, t) spatiotemporal volume. The mathematical formalism describing the evolution of pixels in time is that of fiber bundles, but it is difficult to implement directly. Instead, we express the problem in a higher dimensional 5D space, in which pixels with coherent apparent motion produce smooth 3D layers. The coordinates in this 5D space are (x, y, t, vx, vy). It is initially populated by peaks of correlation. We then enforce smoothness both in the spatial and temporal domains simultaneously, using the tensor voting framework. Unlike the previous 4D approach which uses only two frames, we fully take advantage of the temporal information through multiple images, and it significantly improves the motion analysis results. The approach is generic, in the sense that it does not make restrictive assumptions on the observed scene or on the camera motion. We present some results on real data sets, and they are very good on even challenging image sequences such as serious occlusion.

[1]  Gérard G. Medioni,et al.  Layered 4D Representation and Voting for Grouping from Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Robert C. Bolles,et al.  Generalizing Epipolar-Plane Image Analysis on the spatiotemporal surface , 2004, International Journal of Computer Vision.

[4]  René Vidal,et al.  Motion Segmentation with Missing Data Using PowerFactorization and GPCA , 2004, CVPR.

[5]  Mubarak Shah,et al.  Motion layer extraction in the presence of occlusion using graph cuts , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Lihi Zelnik-Manor,et al.  Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Mi-Suen Lee,et al.  A Computational Framework for Segmentation and Grouping , 2000 .

[8]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[9]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Robert C. Bolles,et al.  Epipolar-plane image analysis: An approach to determining structure from motion , 1987, International Journal of Computer Vision.

[12]  Edward H. Adelson,et al.  A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Kenichi Kanatani,et al.  Motion segmentation by subspace separation and model selection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[15]  M. Shah,et al.  Motion layer extraction in the presence of occlusion using graph cut , 2004, CVPR 2004.

[16]  P. Anandan,et al.  Accurate computation of optical flow by using layered motion representations , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[17]  Edward H. Adelson,et al.  Layered representation for motion analysis , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.