Efficient hundreds-baseline stereo by counting interest points for moving omni-directional multi-camera system

In this article, we propose an efficient method for estimating a depth map from long-baseline image sequences captured by a calibrated moving multi-camera system. Our concept for estimating a depth map is very simple; we integrate the counting of the total number of interest points (TNIP) in images with the original framework of multiple baseline stereo. Even by using a simple algorithm, the depth can be determined without computing similarity measures such as SSD (sum of squared differences) and NCC (normalized cross correlation) that have been used for conventional stereo matching. The proposed stereo algorithm is computationally efficient and robust for distortions and occlusions and has high affinity with omni-directional and multi-camera imaging. Although expected trade-off between accuracy and efficiency is confirmed for a naive TNIP-based method, a hybrid approach that uses both TNIP and SSD improve this with realizing high accurate and efficient depth estimation. We have experimentally verified the validity and feasibility of the TNIP-based stereo algorithm for both synthetic and real outdoor scenes.

[1]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  W. Eric L. Grimson,et al.  Computational Experiments with a Feature Based Stereo Algorithm , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  M. Sanfourche On the choice of the correlation term for multi-baseline stereo-vision , 2004 .

[4]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[5]  Yutaka Tanaka,et al.  Robust depth-map estimation from image sequences with precise camera operation parameters , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6]  Hans P. Morevec Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[7]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[8]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[9]  Kenichi Kanatani,et al.  Mesh optimization using an inconsistency detection template , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Sean Hayes,et al.  View synthesis by trinocular edge matching and transfer , 2000, Image Vis. Comput..

[12]  Robert C. Bolles,et al.  Epipolar-plane image analysis: An approach to determining structure from motion , 1987, International Journal of Computer Vision.

[13]  Richard Szeliski,et al.  Handling occlusions in dense multi-view stereo , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Masatoshi Okutomi,et al.  Shape recovery of rotating object using weighted voting of spacio-temporal images , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[15]  Christopher J. Taylor,et al.  A Method of Automated Landmark Generation for Automated 3D PDM Construction , 2000, BMVC.

[16]  Naokazu Yokoya,et al.  Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-Baseline Stereo Using a Hand-Held Video Camera , 2004, International Journal of Computer Vision.

[17]  Leif Kobbelt,et al.  Robust and Efficient Photo-Consistency Estimation for Volumetric 3D Reconstruction , 2006, ECCV.

[18]  Takeo Kanade,et al.  A multibaseline stereo system with active illumination and real-time image acquisition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[19]  Naokazu Yokoya,et al.  Extrinsic Camera Parameter Recovery from Multiple Image Sequences Captured by an Omni-Directional Multi-camera System , 2004, ECCV.

[20]  Takeo Kanade,et al.  A Multiple-Baseline Stereo , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Masatoshi Okutomi,et al.  A Simple Stereo Algorithm to Recover Precise Object Boundaries and Smooth Surfaces , 2004, International Journal of Computer Vision.

[22]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[24]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[25]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[26]  Takeo Kanade,et al.  A locally adaptive window for signal matching , 2004, International Journal of Computer Vision.

[27]  Takeo Kanade,et al.  Image-consistent surface triangulation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[28]  Richard Szeliski,et al.  3-D Scene Data Recovery Using Omnidirectional Multibaseline Stereo , 2004, International Journal of Computer Vision.

[29]  Paul S. Heckbert,et al.  Graphics gems IV , 1994 .

[30]  Naokazu Yokoya,et al.  High-resolution panoramic movie generation from video streams acquired by an omnidirectional multi-camera system , 2003, Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI2003..