Urban 3D semantic modelling using stereo vision

In this paper we propose a robust algorithm that generates an efficient and accurate dense 3D reconstruction with associated semantic labellings. Intelligent autonomous systems require accurate 3D reconstructions for applications such as navigation and localisation. Such systems also need to recognise their surroundings in order to identify and interact with objects of interest. Considerable emphasis has been given to generating a good reconstruction but less effort has gone into generating a 3D semantic model. The inputs to our algorithm are street level stereo image pairs acquired from a camera mounted on a moving vehicle. The depth-maps, generated from the stereo pairs across time, are fused into a global 3D volume online in order to accommodate arbitrary long image sequences. The street level images are automatically labelled using a Conditional Random Field (CRF) framework exploiting stereo images, and label estimates are aggregated to annotate the 3D volume. We evaluate our approach on the KITTI odometry dataset and have manually generated ground truth for object class segmentation. Our qualitative evaluation is performed on various sequences of the dataset and we also quantify our results on a representative subset.

[1]  Wolfram Burgard,et al.  Improving robot navigation in structured outdoor environments by identifying vegetation from laser data , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[3]  Sebastian Thrun,et al.  Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[4]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[5]  Uwe Franke,et al.  Stixmentation - Probabilistic Stixel based Traffic Scene Labeling , 2012, BMVC.

[6]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[7]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[8]  Kurt Konolige,et al.  Navigation in hybrid metric-topological maps , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[11]  Arthur W. Toga,et al.  Surface mapping brain function on 3D models , 1990, IEEE Computer Graphics and Applications.

[12]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[13]  Paul Newman,et al.  Fast Probabilistic Labeling of City Maps , 2008, Robotics: Science and Systems.

[14]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Philip H. S. Torr,et al.  Automatic dense visual semantic mapping from street-level imagery , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[19]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[21]  D. Fox,et al.  Classification and Semantic Mapping of Urban Environments , 2011, Int. J. Robotics Res..

[22]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2011, International Journal of Computer Vision.