Low-level fusion of color, texture and depth for robust road scene understanding

We propose a novel approach to pixel-level semantic labeling, which aims to rapidly infer the coarse layout of street scenes from color, texture and depth information in a joint fashion using a randomized decision forest. The recovered pixel-level class probability maps provide a general purpose basis to guide more elaborate vision algorithms. To demonstrate the richness of our labeling, we extend the well-known Stixel model to use the semantic labels as input cues. In addition, we employ our generated low-level information as an attention mechanism for a vehicle detector. In both cases, recognition performance and accuracy are significantly improved. In our experimental evaluation on the public KITTI benchmark, we thoroughly study the characteristics of different feature channels as well as their contribution to the overall pixel-level labeling result. Our results underline that the combination of several orthogonal feature channels in a joint model is key to superior performance. This performance improvement comes at little additional cost, given that our approach is able to operate at 100 Hz using a GPU implementation.

[1]  Sven Behnke,et al.  Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Sebastian Ramos,et al.  Vision-Based Offline-Online Perception Paradigm for Autonomous Driving , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[3]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[4]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[6]  Uwe Franke,et al.  Stixmentation - Probabilistic Stixel based Traffic Scene Labeling , 2012, BMVC.

[7]  Uwe Franke,et al.  The Stixel World - A Compact Medium Level Representation of the 3D-World , 2009, DAGM-Symposium.

[8]  Paul Newman,et al.  Lighting invariant urban street classification , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Markus Enzweiler,et al.  Efficient Stixel-based object recognition , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[10]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[11]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ali Shahrokni,et al.  Urban 3D semantic modelling using stereo vision , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[14]  Stefan Roth,et al.  Object-Level Priors for Stixel Generation , 2014, GCPR.

[15]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  U. Franke,et al.  An Incremental Map Building Approach via Static Stixel Integration , 2013 .

[18]  David Pfeiffer,et al.  The Stixel World: a compact medium-level represantation for efficiently modeling dynamic three-dimensional environments , 2011 .

[19]  David Pfeiffer,et al.  Modeling Dynamic 3D Environments by Means of The Stixel World , 2011, IEEE Intelligent Transportation Systems Magazine.

[20]  Luc Van Gool,et al.  Fast Stixel Computation for Fast Pedestrian Detection , 2012, ECCV Workshops.

[21]  Huijing Zhao,et al.  Information Fusion on Oversegmented Images: An Application for Urban Scene Understanding , 2013, MVA.

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Joachim Denzler,et al.  Semantic Segmentation with Millions of Features: Integrating Multiple Cues in a Combined Random Forest Approach , 2012, ACCV.

[24]  Stefan Roth,et al.  Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding , 2014, ECCV.

[25]  Jean-Philippe Tarel,et al.  Real time obstacle detection in stereovision on non flat road geometry through "v-disparity" representation , 2002, Intelligent Vehicle Symposium, 2002. IEEE.

[26]  Peter H. N. de With,et al.  Extending the Stixel World with online self-supervised color modeling for road-versus-obstacle segmentation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[27]  Hu He,et al.  Nonparametric semantic segmentation for 3D street scenes , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.