Nonparametric semantic segmentation for 3D street scenes

In this paper we propose a method to generate a large scale and accurate dense 3D semantic map of street scenes. A dense 3D semantic model of the environment can significantly improve a number of robotic applications such as autonomous driving, navigation or localisation. Instead of using offline trained classifiers for semantic segmentation, our approach employs a data-driven, nonparametric method to parse scenes which easily scale to a large environment and generalise to different scenes. We use stereo image pairs collected from cameras mounted on a moving car to produce dense depth maps which are combined into a global 3D reconstruction using camera poses from stereo visual odometry. Simultaneously, 2D automatic semantic segmentation using a nonparametric scene parsing method is fused into the 3D model. Furthermore, the resultant 3D semantic model is improved with the consideration of moving objects in the scene. We demonstrate our method on the publicly available KITTI dataset and evaluate the performance against manually generated ground truth.

[1]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[2]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[3]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[5]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Marc Pollefeys,et al.  Stereo depth map fusion for robot navigation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[10]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[12]  D. Fox,et al.  Classification and Semantic Mapping of Urban Environments , 2011, Int. J. Robotics Res..

[13]  Mark Joseph Cummins,et al.  Probabilistic localization and mapping in appearance space , 2009 .

[14]  Ali Shahrokni,et al.  Urban 3D semantic modelling using stereo vision , 2013, 2013 IEEE International Conference on Robotics and Automation.

[15]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[16]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[17]  TorralbaAntonio,et al.  Nonparametric Scene Parsing via Label Transfer , 2011 .

[18]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[19]  Vincent Lepetit,et al.  View-based Maps , 2010, Int. J. Robotics Res..

[20]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[22]  Andreas Geiger,et al.  Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[23]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[24]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[25]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[26]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[27]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.