Active depth extraction using image streams

The computation of 3D structure from motion using a monocular sequence of images in the paradigm of active vision is presented in this paper. Robotic tasks such as navigation, manipulation, and object recognition all require 3D description of scene. The 3D description for these tasks varies in resolution, accuracy, robustness, range, and time. For a robotic system capable of performing a wide range of applications, it must have the ability to actively control the imaging parameters so that a 3D description sufficient enough for that task is generated. In the approach presented here, the 3D structure is determined in two steps. In the first step, based on the analysis of the spatial and the temporal gradients of an image stream, a characterization of 3D information in terms of camera displacements which result in a fixed disparity, is obtained. In the second step, extrapolated disparity values between the first and last frame of the image stream, are refined using normalized cross-correlation. The length of the image stream, interframe camera displacement, and the disparity value are actively controlled to obtain the 3D structure of desired quality. This approach has been implemented on a pipeline based computing environment to provide a real-time performance. Extensive experiments have been conducted to verify the performance and capabilities of this approach.

[1]  Takeo Kanade,et al.  Shape and motion without depth , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[2]  Jake K. Aggarwal,et al.  On the computation of motion from sequences of images-A review , 1988, Proc. IEEE.

[3]  Ramesh C. Jain,et al.  Range estimation from intensity gradient analysis , 1989, Proceedings, 1989 International Conference on Robotics and Automation.