Context-aware Padding for Semantic Segmentation

Zero padding is widely used in convolutional neural networks to prevent the size of feature maps diminishing too fast. However, it has been claimed to disturb the statistics at the border [18]. As an alternative, we propose a context-aware (CA) padding approach to extend the image. We reformulate the padding problem as an image extrapolation problem and illustrate the effects on the semantic segmentation task. Using context-aware padding, the ResNet-based segmentation model achieves higher mean Intersection-Over-Union than the traditional zero padding on the Cityscapes and the dataset of DeepGlobe satellite imaging challenge. Furthermore, our padding does not bring noticeable overhead during training and testing.

[1]  William T. Freeman,et al.  Boundless: Generative Adversarial Networks for Image Extension , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[3]  Ting-Chun Wang,et al.  Partial Convolution based Padding , 2018, ArXiv.

[4]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Shuang Wu,et al.  Convolution with even-sized kernels and symmetric padding , 2019, NeurIPS.

[7]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Min Sun,et al.  Omnidirectional CNN for Visual Place Recognition and Navigation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[12]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[13]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[14]  Steven M. Drucker,et al.  Quality prediction for image completion , 2012, ACM Trans. Graph..

[15]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Jingdong Wang,et al.  OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[18]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[19]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Shuicheng Yan,et al.  Very Long Natural Scenery Image Prediction by Outpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[22]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jiaya Jia,et al.  Wide-Context Semantic Image Extrapolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Yinda Zhang,et al.  FrameBreak: Dramatic Image Extrapolation by Guided Shift-Maps , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sewoong Ahn,et al.  Distribution Padding in Convolutional Neural Networks , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[28]  Shi-Min Hu,et al.  Deep Portrait Image Completion and Extrapolation , 2018, IEEE Transactions on Image Processing.

[29]  Silvio Savarese,et al.  Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[31]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Johann Marius Zöllner,et al.  Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental Study , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[34]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Ralph R. Martin,et al.  BiggerPicture: data-driven image extrapolation using graph matching , 2014, ACM Trans. Graph..