PPANet: Point-Wise Pyramid Attention Network for Semantic Segmentation

In recent years, convolutional neural networks (CNNs) have been at the centre of the advances and progress of advanced driver assistance systems and autonomous driving. This paper presents a point-wise pyramid attention network, namely, PPANet, which employs an encoder-decoder approach for semantic segmentation. Specifically, the encoder adopts a novel squeeze nonbottleneck module as a base module to extract feature representations, where squeeze and expansion are utilized to obtain high segmentation accuracy. An upsampling module is designed to work as a decoder; its purpose is to recover the lost pixel-wise representations from the encoding part. The middle part consists of two parts point-wise pyramid attention (PPA) module and an attention-like module connected in parallel. The PPA module is proposed to utilize contextual information effectively. Furthermore, we developed a combined loss function from dice loss and binary cross-entropy to improve accuracy and get faster training convergence in KITTI road segmentation. The paper conducted the training and testing experiments on KITTI road segmentation and Camvid datasets, and the evaluation results show that the proposed method proved its effectiveness in road semantic segmentation.

[1]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Piotr Bilinski,et al.  Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Yan Yan,et al.  Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Hui Kong,et al.  Histograms of the Normalized Inverse Depth and Line Scanning for Urban Road Detection , 2019, IEEE Transactions on Intelligent Transportation Systems.

[8]  Asad J. Khattak,et al.  Safety, Energy, and Emissions Impacts of Adaptive Cruise Control and Cooperative Adaptive Cruise Control , 2020, Transportation Research Record: Journal of the Transportation Research Board.

[9]  Linda G. Shapiro,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[11]  Jiawei He,et al.  Fusing Appearance and Prior Cues for Road Detection , 2019 .

[12]  Hao Liu,et al.  Freeway vehicle fuel efficiency improvement via cooperative adaptive cruise control , 2020, Journal of Intelligent Transportation Systems.

[13]  Korris Fu-Lai Chung,et al.  Clustering by transmission learning from data density to label manifold with statistical diffusion , 2020, Knowl. Based Syst..

[14]  Ming Wu,et al.  D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Ekaterina Kurbatova Road Detection Based on Color and Geometry Characteristics , 2020, 2020 International Conference on Information Technology and Nanotechnology (ITNT).

[17]  Vittorio Ferrari,et al.  Region-Based Semantic Segmentation with End-to-End Training , 2016, ECCV.

[18]  Kaijian Xia,et al.  Alzheimer's disease multiclass diagnosis via multimodal neuroimaging embedding feature selection and fusion , 2021, Inf. Fusion.

[19]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Eros Comunello,et al.  Passive Vision Region-Based Road Detection , 2019, ACM Comput. Surv..

[22]  Huiqun Wu,et al.  A Clustering Method Based on Fast Exemplar Finding and Its Application on Brain Magnetic Resonance Images Segmentation , 2016 .

[23]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[24]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Liang Xiao,et al.  Hybrid conditional random field based camera-LIDAR fusion for road detection , 2017, Inf. Sci..

[27]  Sheng Tang,et al.  CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[28]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yuncheng Jiang,et al.  Modeling and simulation of adaptive cruise control system , 2020, ArXiv.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jungong Han,et al.  Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation , 2021, Pattern Recognit..

[32]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[33]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36]  Lennart Svensson,et al.  LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks , 2018, Robotics Auton. Syst..

[37]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[38]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Quan Zhou,et al.  AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network , 2020, Appl. Soft Comput..

[40]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[41]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Dawei Zhao,et al.  Monocular Road Detection Using Structured Random Forest , 2016 .

[43]  Nima Tajbakhsh,et al.  UNet++: A Nested U-Net Architecture for Medical Image Segmentation , 2018, DLMIA/ML-CDS@MICCAI.

[44]  Luis Miguel Bergasa,et al.  CRF-based semantic labeling in miniaturized road scenes , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[45]  Kazunori Onoguchi,et al.  Road Boundary Detection using In-vehicle Monocular Camera , 2018, ICPRAM.

[46]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[47]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[48]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[49]  Shu-Ching Chen,et al.  Multimodal deep representation learning for video classification , 2018, World Wide Web.

[50]  Gen Li,et al.  DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation , 2019, BMVC.

[51]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[52]  Yizhou Yu,et al.  FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation , 2019, ArXiv.

[53]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[54]  Haibo Wang,et al.  ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time , 2019, Applied Intelligence.

[55]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[56]  Yecheng Lyu,et al.  Road Segmentation using CNN and Distributed LSTM , 2018, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[57]  Vincent Frémont,et al.  Color-based road detection and its evaluation on the KITTI road benchmark , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[58]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[60]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[62]  Huafeng Liu,et al.  Road segmentation with image-LiDAR data fusion in deep neural network , 2019, Multimedia Tools and Applications.

[63]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[64]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Zhitao Xiao,et al.  Combining CNN and MRF for road detection , 2017, Comput. Electr. Eng..

[66]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  Wei Sun,et al.  Small Object Augmentation of Urban Scenes for Real-Time Semantic Segmentation , 2020, IEEE Transactions on Image Processing.

[69]  Jianxin Wu,et al.  Vortex Pooling: Improving Context Representation in Semantic Segmentation , 2018, ArXiv.

[70]  Huimin Lu,et al.  Multi-scale deep context convolutional neural networks for semantic segmentation , 2017, World Wide Web.

[71]  Fei Tian,et al.  Brain MRI Tissue Classification Based Fuzzy Clustering with Competitive Learning , 2017 .

[72]  Huafeng Liu,et al.  Deep Representation Learning for Road Detection through Siamese Network , 2019, ArXiv.