Two-Stage Framework for Faster Semantic Segmentation

Semantic segmentation consists of classifying each pixel according to a set of classes. Conventional models spend as much effort classifying easy-to-segment pixels as they do classifying hard-to-segment pixels. This is inefficient, especially when deploying to situations with computational constraints. In this work, we propose a framework wherein the model first produces a rough segmentation of the image, and then patches of the image estimated as hard to segment are refined. The framework is evaluated in four datasets (autonomous driving and biomedical), across four state-of-the-art architectures. Our method accelerates inference time by four, with additional gains for training time, at the cost of some output quality.

[1]  Maxwell D. Collins,et al.  CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hari Kishan Kondaveeti,et al.  A Review of Image Processing Applications based on Raspberry-Pi , 2022, 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS).

[3]  Charles Blundell,et al.  PonderNet: Learning to Ponder , 2021, ArXiv.

[4]  Sylvain Paris,et al.  Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Pascal Fua,et al.  Recurrent U-Net for Resource-Constrained Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yi Yu,et al.  Dense U-net Based on Patch-Based Learning for Retinal Vessel Segmentation , 2019, Entropy.

[9]  Jaime S. Cardoso,et al.  Deep Image Segmentation by Quality Inference , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10]  Siva Karthik Mustikovela,et al.  Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes , 2017, International Journal of Computer Vision.

[11]  Yong Man Ro,et al.  Iterative deep convolutional encoder-decoder network for medical image segmentation , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[12]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[13]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[16]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[17]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[20]  Pedro M. Ferreira,et al.  PH2 - A dermoscopic image database for research and benchmarking , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[21]  Sébastien Marcel,et al.  Torchvision the machine-vision package of torch , 2010, ACM Multimedia.