Structure from Motion: Beyond the Epipolar Constraint

The classic approach to structure from motion entails a clear separation between motion estimation and structure estimation and between two-dimensional (2D) and three-dimensional (3D) information. For the recovery of the rigid transformation between different views only 2D image measurements are used. To have available enough information, most existing techniques are based on the intermediate computation of optical flow which, however, poses a problem at the locations of depth discontinuities. If we knew where depth discontinuities were, we could (using a multitude of approaches based on smoothness constraints) accurately estimate flow values for image patches corresponding to smooth scene patches; but to know the discontinuities requires solving the structure from motion problem first. This paper introduces a novel approach to structure from motion which addresses the processes of smoothing, 3D motion and structure estimation in a synergistic manner. It provides an algorithm for estimating the transformation between two views obtained by either a calibrated or uncalibrated camera. The results of the estimation are then utilized to perform a reconstruction of the scene from a short sequence of images.The technique is based on constraints on image derivatives which involve the 3D motion and shape of the scene, leading to a geometric and statistical estimation problem. The interaction between 3D motion and shape allows us to estimate the 3D motion while at the same time segmenting the scene. If we use a wrong 3D motion estimate to compute depth, we obtain a distorted version of the depth function. The distortion, however, is such that the worse the motion estimate, the more likely we are to obtain depth estimates that vary locally more than the correct ones. Since local variability of depth is due either to the existence of a discontinuity or to a wrong 3D motion estimate, being able to differentiate between these two cases provides the correct motion, which yields the “least varying” estimated depth as well as the image locations of scene discontinuities. We analyze the new constraints, show their relationship to the minimization of the epipolar constraint, and present experimental results using real image sequences that indicate the robustness of the method.

[1]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[2]  Reinhard Koch,et al.  Multi Viewpoint Stereo from Uncalibrated Video Sequences , 1998, ECCV.

[3]  Yiannis Aloimonos,et al.  Shape from video , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[4]  J. M. Foley Binocular distance perception. , 1980, Psychological review.

[5]  Ruzena Bajcsy,et al.  Discrete-Time Rigidity-Constrained Optical Flow , 1997, CAIP.

[6]  Reinhard Koch,et al.  Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[7]  Paul A. Beardsley,et al.  3D Model Acquisition from Extended Image Sequences , 1996, ECCV.

[8]  O. Faugeras Three-dimensional computer vision: a geometric viewpoint , 1993 .

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Patrick Bouthemy,et al.  Multimodal Estimation of Discontinuous Optical Flow using Markov Random Fields , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[12]  Journal of the Optical Society of America , 1950, Nature.

[13]  Takeo Kanade,et al.  A Multiple-Baseline Stereo , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Kostas Daniilidis,et al.  Understanding noise sensitivity in structure from motion , 1996 .

[15]  Yiannis Aloimonos,et al.  Self-Calibration from Image Derivatives , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[17]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[18]  Valdis Berzins,et al.  Dynamic Occlusion Analysis in Optical Flow Fields , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yiannis Aloimonos,et al.  Simultaneous Estimation of Viewing Geometry and Structure , 1998, ECCV.

[20]  Ingemar J. Cox,et al.  A Maximum Likelihood Stereo Algorithm , 1996, Comput. Vis. Image Underst..

[21]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[22]  Y. Aloimonos,et al.  Direct Perception of Three-Dimensional Motion from Patterns of Visual Motion , 1995, Science.

[23]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[24]  Berthold K. P. Horn Motion fields are hardly ever ambiguous , 1988, International Journal of Computer Vision.

[25]  Loong Fah Cheong,et al.  Effects of Errors in the Viewing Geometry on Shape Estimation , 1998, Comput. Vis. Image Underst..

[26]  Anselm Spoerri,et al.  The early detection of motion boundaries , 1990, ICCV 1987.

[27]  Stephen J. Maybank,et al.  Algorithm for analysing optical flow based on the least-squares method , 1986, Image Vis. Comput..

[28]  Robert Pless,et al.  The Ouchi illusion as an artifact of biased flow estimation , 2000, Vision Research.

[29]  David W. Murray,et al.  Scene Segmentation from Visual Motion Using Global Optimization , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Olivier Faugeras,et al.  Three-Dimensional Computer Vision , 1993 .

[31]  J. Todd,et al.  Systematic distortion of perceived three-dimensional structure from motion and binocular stereopsis. , 1995, Journal of experimental psychology. Human perception and performance.

[32]  Olivier D. Faugeras,et al.  Complete Dense Stereovision Using Level Set Methods , 1998, ECCV.

[33]  José L. Marroquín,et al.  Probabilistic solution of inverse problems , 1985 .

[34]  Rachid Deriche,et al.  Dense Depth Map Reconstruction: A Minimization and Regularization Approach which Preserves Discontinuities , 1996, ECCV.

[35]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[36]  Berthold K. P. Horn,et al.  Direct methods for recovering motion , 1988, International Journal of Computer Vision.

[37]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[38]  Brian G. Schunck,et al.  Image Flow Segmentation and Estimation by Constraint Line Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  B. D. Lucas Generalized image matching by the method of differences , 1985 .

[40]  Yiannis Aloimonos,et al.  Directions of Motion Fields are Hardly Ever Ambiguous , 2004, International Journal of Computer Vision.

[41]  Andrew W. Fitzgibbon,et al.  Automatic Camera Recovery for Closed or Open Image Sequences , 1998, ECCV.

[42]  Richard I. Hartley,et al.  An algorithm for self calibration from several views , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Olivier D. Faugeras,et al.  The fundamental matrix: Theory, algorithms, and stability analysis , 2004, International Journal of Computer Vision.