The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.,,,,,, Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.,,,,,, In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.

[1]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Serge J. Belongie,et al.  Residual Networks are Exponential Ensembles of Relatively Shallow Networks , 2016, ArXiv.

[5]  Christopher Joseph Pal,et al.  The Importance of Skip Connections in Biomedical Image Segmentation , 2016, LABELS/DLMIA@MICCAI.

[6]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[12]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Antonio Torralba,et al.  Learning Aligned Cross-Modal Representations from Weakly Aligned Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Joost van de Weijer,et al.  Unrolling Loopy Top-Down Semantic Feedback in Convolutional Deep Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[22]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[23]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[25]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[26]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Lorenzo Torresani,et al.  Deep End2End Voxel2Voxel Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Irfan A. Essa,et al.  Geometric Context from Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.