Semantic segmentation using reinforced fully convolutional densenet with multiscale kernel

In recent years, semantic segmentation has become one of the most active tasks of the computer vision field. Its goal is to group image pixels into semantically meaningful regions. Deep learning methods, in particular those who use convolutional neural network (CNN), have shown a big success for the semantic segmentation task. In this paper, we will introduce a semantic segmentation system using a reinforced fully convolutional densenet with multiscale kernel prediction method. Our main contribution is to build an encoder-decoder based architecture where we increase the width of dense block in the encoder part by conducting recurrent connections inside the dense block. The resulting network structure is called wider dense block where each dense block takes not only the output of the previous layer but also the initial input of the dense block. These recurrent structure emulates the human brain system and helps to strengthen the extraction of the target features. As a result, our network becomes deeper and wider with no additional parameters used because of weights sharing. Moreover, a multiscale convolutional layer has been conducted after the last dense block of the decoder part to perform model averaging over different spatial scales and to provide a more flexible method. This proposed method has been evaluated on two semantic segmentation benchmarks: CamVid and Cityscapes. Our method outperforms many recent works from the state of the art.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Chokri Ben Amar,et al.  Wavelet Transform Based Motion Estimation and Compensation for Video Coding , 2012 .

[4]  Anna Fabijanska,et al.  New accelerated graph-based method of image segmentation applying minimum spanning tree , 2014, IET Image Process..

[5]  Chokri Ben Amar,et al.  Bag of frequent subgraphs approach for image classification , 2015, Intell. Data Anal..

[6]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[7]  B. S. Manjunath,et al.  Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Chokri Ben Amar,et al.  Video Watermarking Based on Neural Networks , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[9]  Wenbin Zou,et al.  Semantic segmentation via sparse coding over hierarchical regions , 2012, 2012 19th IEEE International Conference on Image Processing.

[10]  Chokri Ben Amar,et al.  Improved Very Deep Recurrent Convolutional Neural Network for Object Recognition , 2017, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11]  Jonathan T. Barron,et al.  Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chokri Ben Amar,et al.  Boosted Convolutional Neural Network for object recognition at large scale , 2019, Neurocomputing.

[13]  Sebastian Ramos,et al.  The Cityscapes Dataset , 2015, CVPR 2015.

[14]  Patrick van der Smagt,et al.  CNN-based Segmentation of Medical Imaging Data , 2017, ArXiv.

[15]  Haytham Elghazel,et al.  Graph modeling based video event detection , 2011, 2011 International Conference on Innovations in Information Technology.

[16]  Chokri Ben Amar,et al.  Graph Aggregation Based Image Modeling and Indexing for Video Annotation , 2011, CAIP.

[17]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[19]  Chokri Ben Amar,et al.  Multiscale Fully Convolutional DenseNet for Semantic Segmentation , 2018, J. WSCG.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Andreas Geiger,et al.  Augmented Reality meets Deep Learning , 2017, BMVC.

[22]  Chokri Ben Amar,et al.  Improved Very Deep Recurrent Convolutional Neural Network for Object Recognition , 2018, SMC.

[23]  Chokri Ben Amar,et al.  Indexing and images retrieval by content , 2011, 2011 International Conference on High Performance Computing & Simulation.

[24]  Chokri Ben Amar,et al.  Fast indexing method for image retrieval using tree-structured lattices , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[25]  Yoshua Bengio,et al.  ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Christoph Meinel,et al.  Image Captioning with Deep Bidirectional LSTMs , 2016, ACM Multimedia.

[28]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[29]  Chokri Ben Amar,et al.  A New Structure and Training Procedure for Multi-Mother Wavelet Networks , 2010, Int. J. Wavelets Multiresolution Inf. Process..

[30]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[31]  Sheng Zeng,et al.  Semantic Segmentation Using Multiple Graphs with Block-Diagonal Constraints , 2014, AAAI.

[32]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[33]  Anton van den Hengel,et al.  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[34]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Chokri Ben Amar,et al.  Classification of Alzheimer’s disease subjects from MRI using hippocampal visual features , 2014, Multimedia Tools and Applications.

[36]  Kees Joost Batenburg,et al.  Optimal Threshold Selection for Tomogram Segmentation by Projection Distance Minimization , 2009, IEEE Transactions on Medical Imaging.

[37]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[38]  Chokri Ben Amar,et al.  Graph-based approach for human action recognition using spatio-temporal features , 2014, J. Vis. Commun. Image Represent..

[39]  Chokri Ben Amar,et al.  Multiresolution motion estimation and compensation for video coding , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[40]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Jian Yang,et al.  Importance-Aware Semantic Segmentation for Autonomous Driving System , 2017, IJCAI.

[42]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[43]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  M. Chtourou,et al.  MLP neural network based face recognition system using constructive training algorithm , 2012, 2012 International Conference on Multimedia Computing and Systems.

[45]  Christoph Meinel,et al.  Deep Semantic Mapping for Cross-Modal Retrieval , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[46]  Isabelle Tellier,et al.  Improving Recurrent Neural Networks For Sequence Labelling , 2016, ArXiv.

[47]  Chokri Ben Amar,et al.  A New System for Event Detection from Video Surveillance Sequences , 2010, ACIVS.

[48]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Christoph Meinel,et al.  Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[50]  Christoph Meinel,et al.  Exploring multimodal video representation for action recognition , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[51]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[53]  Kees Joost Batenburg,et al.  Adaptive thresholding of tomograms by projection distance minimization , 2009, Pattern Recognit..

[54]  David A. Clausi,et al.  Multivariate Image Segmentation Using Semantic Region Growing With Adaptive Edge Penalty , 2010, IEEE Transactions on Image Processing.

[55]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[56]  Irfan A. Essa,et al.  Geometric Context from Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[58]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Wen-June Wang,et al.  Learning based semantic segmentation for robot navigation in outdoor environment , 2017, 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS).