Self-paced cross-modality transfer learning for efficient road segmentation

Accurate road segmentation is a prerequisite for autonomous driving. Current state-of-the-art methods are mostly based on convolutional neural networks (CNNs). Nevertheless, their good performance is at expense of abundant annotated data and high computational cost. In this work, we address these two issues by a self-paced cross-modality transfer learning framework with efficient projection CNN. To be specific, with the help of stereo images, we first tackle a relevant but easier task, i.e. free-space detection with well developed unsupervised methods. Then, we transfer these useful but noisy knowledge in depth modality to single RGB modality with self-paced CNN learning. Finally, we only need to fine-tune the CNN with a few annotated images to get good performance. In addition, we propose an efficient projection CNN, which can improve the fine-grained segmentation results with little additional cost. At last, we test our method on KITTI road benchmark. Our proposed method surpasses all published methods at a speed of 15fps.

[1]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[2]  Vincent Frémont,et al.  Color-based road detection and its evaluation on the KITTI road benchmark , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[3]  Sebastian Thrun,et al.  Reverse Optical Flow for Self-Supervised Adaptive Autonomous Robot Navigation , 2007, International Journal of Computer Vision.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  David Gerónimo Gómez,et al.  Survey of Pedestrian Detection for Advanced Driver Assistance Systems , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[7]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[9]  Vincent Frémont,et al.  Exploiting fully convolutional neural networks for fast road detection , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Rahul Mohan,et al.  Deep Deconvolutional Networks for Scene Parsing , 2014, ArXiv.

[13]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[14]  Jean Ponce,et al.  General Road Detection From a Single Image , 2010, IEEE Transactions on Image Processing.

[15]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[16]  Peter H. N. de With,et al.  Free-Space Detection with Self-Supervised and Online Trained Fully Convolutional Networks , 2016, Autonomous Vehicles and Machines.

[17]  Peter H. N. de With,et al.  Extending the Stixel World with online self-supervised color modeling for road-versus-obstacle segmentation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[18]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Shiguang Shan,et al.  Supplementary Materials : Self-Paced Learning with Diversity , 2014 .

[20]  Rudolf Mester,et al.  Free Space Computation Using Stochastic Occupancy Grids and Dynamic Programming , 2008 .

[21]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ethan Fetaya,et al.  StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation , 2015, BMVC.

[23]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[27]  H.-H. Nagel,et al.  Texture-based segmentation of road images , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[28]  James J. Little,et al.  Play and Learn: Using Video Games to Train Computer Vision Models , 2016, BMVC.

[29]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[30]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[32]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[34]  J.M. Alvarez,et al.  Illuminant-invariant model-based road segmentation , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[35]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[36]  Luc Van Gool,et al.  Stixels estimation without depth map computation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[37]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[38]  Ankit Laddha,et al.  Map-supervised road detection , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[39]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[40]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[41]  Deyu Meng,et al.  What Objective Does Self-paced Learning Indeed Optimize? , 2015, ArXiv.

[42]  Uwe Franke,et al.  The Stixel World - A Compact Medium Level Representation of the 3D-World , 2009, DAGM-Symposium.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Nicu Sebe,et al.  Self Paced Deep Learning for Weakly Supervised Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Jean-Philippe Tarel,et al.  Real time obstacle detection in stereovision on non flat road geometry through "v-disparity" representation , 2002, Intelligent Vehicle Symposium, 2002. IEEE.

[46]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[47]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).