Incremental and Multi-Task Learning Strategies for Coarse-To-Fine Semantic Segmentation

The semantic understanding of a scene is a key problem in the computer vision field. In this work, we address the multi-level semantic segmentation task where a deep neural network is first trained to recognize an initial, coarse, set of a few classes. Then, in an incremental-like approach, it is adapted to segment and label new objects’ categories hierarchically derived from subdividing the classes of the initial set. We propose a set of strategies where the output of coarse classifiers is fed to the architectures performing the finer classification. Furthermore, we investigate the possibility to predict the different levels of semantic understanding together, which also helps achieve higher accuracy. Experimental results on the New York University Depth v2 (NYUDv2) dataset show promising insights on the multi-level scene understanding.

[1]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[3]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  José García Rodríguez,et al.  A survey on deep learning techniques for image and video semantic segmentation , 2018, Appl. Soft Comput..

[5]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[7]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Gabriela Csurka,et al.  What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[9]  Yuhao Wang,et al.  Dense Semantic Labeling with Atrous Spatial Pyramid Pooling and Decoder for High-Resolution Remote Sensing Imagery , 2018, Remote. Sens..

[10]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[11]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[12]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Sanja Fidler,et al.  3D Graph Neural Networks for RGBD Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Gang Wang,et al.  Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling , 2014, ECCV.

[15]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[17]  Ludovico Minto,et al.  Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting , 2017, IET Comput. Vis..

[18]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[19]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[20]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Gang Wang,et al.  Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks , 2016, ECCV.

[22]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Daniel Cohen-Or,et al.  Cascaded Feature Network for Semantic Segmentation of RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Pietro Zanuttigh,et al.  Knowledge Distillation for Incremental Learning in Semantic Segmentation , 2019, Comput. Vis. Image Underst..

[29]  Pietro Zanuttigh,et al.  Region Merging Driven by Deep Learning for RGB-D Segmentation and Labeling , 2019, ICDSC.

[30]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[31]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[32]  Hong Liu,et al.  RGB-D joint modelling with scene geometric information for indoor semantic segmentation , 2018, Multimedia Tools and Applications.

[33]  Pietro Zanuttigh,et al.  Incremental Learning Techniques for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[34]  Yong Jae Lee,et al.  Cross-Domain Self-Supervised Multi-task Feature Learning Using Synthetic Imagery , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[36]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Sven Behnke,et al.  Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Jana Kosecka,et al.  Semantic parsing for priming object detection in indoors RGB-D scenes , 2015, Int. J. Robotics Res..

[39]  Mohammed Bennamoun,et al.  Geometry Driven Semantic Labeling of Indoor Scenes , 2014, ECCV.

[40]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[41]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[42]  Yann LeCun,et al.  Convolutional nets and watershed cuts for real-time semantic Labeling of RGBD videos , 2014, J. Mach. Learn. Res..