ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

We propose ViewAL, a novel active learning strategy for semantic segmentation that exploits viewpoint consistency in multi-view datasets. Our core idea is that inconsistencies in model predictions across viewpoints provide a very reliable measure of uncertainty and encourage the model to perform well irrespective of the viewpoint under which objects are observed. To incorporate this uncertainty measure, we introduce a new viewpoint entropy formulation, which is the basis of our active learning strategy. In addition, we propose uncertainty computations on a superpixel level, which exploits inherently localized signal in the segmentation task, directly lowering the annotation costs. This combination of viewpoint entropy and the use of superpixels allows to efficiently select samples that are highly informative for improving the network. We demonstrate that our proposed active learning strategy not only yields the best-performing models for the same amount of required labeled data, but also significantly reduces labeling effort. For instance, our method achieves 95% of maximum achievable network performance using only 7%, 17%, and 24% labeled data on SceneNet-RGBD, ScanNet, and Matterport3D, respectively. On these datasets, the best state-of-the-art method achieves the same performance with 14%, 27% and 33% labeled data. Finally, we demonstrate that labeling using superpixels yields the same quality of ground-truth compared to labeling whole images, but requires 25% less time.

[1]  Ion Muslea,et al.  Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[2]  Xavier Giró-i-Nieto,et al.  Cost-Effective Active Learning for Melanoma Segmentation , 2017, NIPS 2017.

[3]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[4]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Changsheng Xu,et al.  Multi-view multi-label active learning for image classification , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[8]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[9]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[14]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[15]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[16]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Joachim M. Buhmann,et al.  Active learning for semantic segmentation with expected change , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jose M. Alvarez,et al.  Large-Scale Visual Active Learning with Deep Probabilistic Ensembles , 2018, ArXiv.

[29]  Joachim Denzler,et al.  Active and Continuous Exploration with Deep Neural Networks and Expected Model Output Changes , 2016, ArXiv.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[32]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Dhruv Batra,et al.  Active learning for structured probabilistic models with histogram approximation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[35]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[36]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[38]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[41]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[42]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[43]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[44]  Stefan Leutenegger,et al.  SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth , 2016, ArXiv.

[45]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[46]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[47]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Luc Van Gool,et al.  SEEDS: Superpixels Extracted Via Energy-Driven Sampling , 2012, International Journal of Computer Vision.

[50]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[51]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[52]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[53]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[54]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[55]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Lin Yang,et al.  Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation , 2017, MICCAI.

[58]  Carsten Rother,et al.  CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation , 2018, BMVC.

[59]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[60]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[61]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.