Fast road scene segmentation using deep learning and scene-based models

Pixel-labeling approaches using semantic segmentation play an important role in road scene understanding. In recent years, deep learning approaches such as the deconvolutional neural network have been used for semantic segmentation, obtaining state-of-the-art results. However, the segmentation results have limited object delineation. In this paper, we adopt the de-convolutional neural network to perform the semantic segmentation of the road scene using colour and depth information. Moreover, we improve the network's limited object delineation within a computationally efficient framework using novel features that are learnt at the pixel-level and patch-level for different road scenes. The patch-level features represent the road scene geometry. On the other hand, the learnt pixel-level features represent the appearance and depth information. The features learnt for the different road scenes are indexed with the scene's pre-defined label. Following the indexing, the random forest classifier is trained to retrieve the relevant geometric and appearance-depth features for a given road scene. The retrieved features are then used to refine identified error regions in the initial semantic segmentation estimate. Our proposed algorithm is evaluated on an acquired dataset and compared with state-of-the-art baseline algorithms. We also perform a detailed parametric evaluation of our proposed framework. The experimental results show that our proposed algorithm reports better accuracy.

[1]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[2]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[3]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[5]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[6]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[7]  Seiichi Mita,et al.  Real-time Dense Disparity Estimation based on Multi-Path Viterbi for Intelligent Vehicle Applications , 2014, BMVC.

[8]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[9]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.