Motion and Structure from Image Sequences

Estimating motion and structure of the scene from image sequences is a very important and active research area in computer vision. The results of research have applications in vision-guided navigation, robot vision, 3-D object recognition and manipulation etc. Many theoretical results and new techniques developed may also apply to the related problems of other fields. Computing the image displacement field, or matching two images is one of the difficult problems in motion analysis. A computational approach to image matching has been developed that uses multiple attributes associated with images to yield a generally overdetermined system of matching constraints, taking into account possible structural discontinuities and occlusions. From the computed image displacement field, the next step is to compute the motion parameters and the structure of the scene. A two-step approach is introduced to solve the nonlinear optimization problem reliably and efficiently. The uniqueness of solution, robustness of the solution in the presence of noise, estimation of errors, dependency of the reliability of solution on motion, scene, and the parameters of image sensors have been investigated. It is analyzed that a batch processing technique (Levenberg-Marquardt nonlinear least-squares method) generally performs better than a sequential processing technique (iterated extended Kalman filtering) for nonlinear problems. For those problems where estimates are needed before all the data are acquired, a recursive batch processing technique has been developed to improve performance and computational efficiency. The performance of the motion estimation algorithm has essentially reached the Cramer-Rao bound. The algorithm has been applied to real world scenes with depth discontinuities and occlusions to compute motion parameters, dense depth maps and occlusion maps, from two images taken at different unknown positions and orientations relative to the scene. The standard discrepancy between the projection of the inferred 3-D scene and the actually observed projection is as small as one half of a pixel. Other problems investigated include: (1) motion and structure from point correspondences for planar scenes. (2) motion and structure from line correspondences. (3) dynamic motion estimation and prediction from long image sequences.