Fast traffic scene segmentation using multi-range features from multi-resolution filtered and spatial context channels

In this paper we describe a fast solution for the semantic segmentation of traffic scenarios. We propose a multiresolution filtering scheme over LUV + HOG image channels using high pass and low pass filtered channels at multiple scales. To add spatial context, we extend the filtered channels with horizontal and vertical position channels. We introduce multi-range classification features that capture local structure and context for achieving fast semantic segmentation of traffic scenarios. Binary boosting based pixel classifiers are trained for each semantic class. Finally, we use these classifiers to provide the unary potential term in a dense Conditional Random Field. We evaluate the proposed solution on the CamVid traffic scene segmentation benchmark and achieve state of art results at 25 FPS, being the fastest top performing approach.

[1]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[2]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[3]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[4]  Thierry Denoeux,et al.  Evidential combination of pedestrian detectors , 2014, BMVC.

[5]  Uwe Franke,et al.  Low-level fusion of color, texture and depth for robust road scene understanding , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[6]  LeCunYann,et al.  Learning Hierarchical Features for Scene Labeling , 2013 .

[7]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Sebastian Ramos,et al.  The Cityscapes Dataset , 2015, CVPR 2015.

[9]  Arthur Daniel Costea,et al.  Multi-class segmentation for traffic scenarios at over 50 FPS , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[10]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Parsing , 2013, ArXiv.

[11]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[12]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[14]  Pushmeet Kohli,et al.  P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[17]  Jana Kosecka,et al.  Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[19]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[20]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[21]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[22]  David W. Jacobs,et al.  Deep hierarchical parsing for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[24]  Bernt Schiele,et al.  Filtered channel features for pedestrian detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[27]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bastian Leibe,et al.  Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust $P^N$ Potentials , 2011, BMVC.

[29]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[30]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[31]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[32]  Luc Van Gool,et al.  Active MAP Inference in CRFs for Efficient Semantic Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Nick Barnes,et al.  Efficient scene parsing by sampling unary potentials in a fully-connected CRF , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).