Monocular 3D structure estimation for urban scenes

We propose a 3D structure estimation framework that adopts the slanted-planes representation in order to provide a dense estimation. The proposed approach fuses sparse 3D reconstructed point cloud obtained using several feature matching methods and noisy dense optical flow in order to perform accurate structure fitting and visually appealing results. We formulate the problem as a weighted total least square model that takes into account the occlusion boundaries between neighboring planes. We also propose an extended flow-based superpixel segmentation which is adaptive to the sparse feature points density for more balanced reconstruction. To validate our approach, we present 3D models obtained using the KITTI dataset [1] compared with other methods.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Judith Dijk,et al.  Closed Form Solution for the Scale Ambiguity Problem in Monocular Visual Odometry , 2010, ICIRA.

[3]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[5]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[6]  Luc Van Gool,et al.  Real-time stereo and flow-based video segmentation with superpixels , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[7]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[8]  Maxime Lhuillier,et al.  Genus refinement of a manifold surface reconstructed by sculpting the 3d-Delaunay triangulation of Structure-from-Motion points , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  Cristian Sminchisescu,et al.  Efficient Closed-Form Solution to Generalized Boundary Detection , 2012, ECCV.

[10]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[12]  Luc Van Gool,et al.  3D Urban Scene Modeling Integrating Recognition and Reconstruction , 2008, International Journal of Computer Vision.

[13]  Zoltan-Csaba Marton,et al.  On Fast Surface Reconstruction Methods for Large and Noisy Datasets , 2009, IEEE International Conference on Robotics and Automation.

[14]  Jana Kosecka,et al.  Multi-view Superpixel Stereo in Urban Environments , 2010, International Journal of Computer Vision.

[15]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[16]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[18]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[19]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[20]  Manolis I. A. Lourakis,et al.  SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[21]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.