StereoScan: Dense 3d reconstruction in real-time

Accurate 3d perception from video sequences is a core subject in computer vision and robotics, since it forms the basis of subsequent scene analysis. In practice however, online requirements often severely limit the utilizable camera resolution and hence also reconstruction accuracy. Furthermore, real-time systems often rely on heavy parallelism which can prevent applications in mobile devices or driver assistance systems, especially in cases where FPGAs cannot be employed. This paper proposes a novel approach to build 3d maps from high-resolution stereo sequences in real-time. Inspired by recent progress in stereo matching, we propose a sparse feature matcher in conjunction with an efficient and robust visual odometry algorithm. Our reconstruction pipeline combines both techniques with efficient stereo matching and a multi-view linking scheme for generating consistent 3d point clouds. In our experiments we show that the proposed odometry method achieves state-of-the-art accuracy. Including feature matching, the visual odometry part of our algorithm runs at 25 frames per second, while - at the same time - we obtain new depth maps at 3-4 fps, sufficient for online 3d reconstructions.

[1]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[2]  Dinesh Manocha,et al.  Applied Computational Geometry Towards Geometric Engineering , 1996, Lecture Notes in Computer Science.

[3]  Don Ray Murray,et al.  Stereo vision based mapping and navigation for mobile robots , 1997, Proceedings of International Conference on Robotics and Automation.

[4]  Reinhard Koch,et al.  Multi Viewpoint Stereo from Uncalibrated Video Sequences , 1998, ECCV.

[5]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[6]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[7]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[8]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Robert C. Bolles,et al.  Outdoor Mapping and Navigation Using Stereo Vision , 2006, ISER.

[10]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Ian D. Reid,et al.  Real-Time Monocular SLAM with Straight Lines , 2006, BMVC.

[13]  Jan-Michael Frahm,et al.  Towards Urban 3D Reconstruction from Video , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[14]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Uwe Franke,et al.  The Stixel World - A Compact Medium Level Representation of the 3D-World , 2009, DAGM-Symposium.

[17]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[18]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[21]  Marc Pollefeys,et al.  Fast robust large-scale mapping from video and internet photo collections , 2010 .

[22]  Uwe Franke,et al.  Efficient representation of traffic scenes by means of dynamic stixels , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[23]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[24]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[25]  Andreas Geiger,et al.  Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[26]  Martin Lauer,et al.  A generative model for 3D urban scene understanding from movable platforms , 2011, CVPR 2011.