Object Boundary Detection and Classification with Image-Level Labels

Semantic boundary and edge detection aims at simultaneously detecting object edge pixels in images and assigning class labels to them. Systematic training of predictors for this task requires the labeling of edges in images which is a particularly tedious task. We propose a novel strategy for solving this task, when pixel-level annotations are not available, performing it in an almost zero-shot manner by relying on conventional whole image neural net classifiers that were trained using large bounding boxes. Our method performs the following two steps at test time. Firstly it predicts the class labels by applying the trained whole image network to the test images. Secondly, it computes pixel-wise scores from the obtained predictions by applying backprop gradients as well as recent visualization algorithms such as deconvolution and layer-wise relevance propagation. We show that high pixel-wise scores are indicative for the location of semantic boundaries, which suggests that the semantic boundary problem can be approached without using edge labels during the training phase.

[1]  Alexander Binder,et al.  Analyzing Classifiers: Fisher Vectors and Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alexander Binder,et al.  The LRP Toolbox for Artificial Neural Networks , 2016, J. Mach. Learn. Res..

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[9]  Jordi Pont-Tuset,et al.  Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Mario Fritz,et al.  Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  Ming-Yu Liu,et al.  CASENet: Deep Category-Aware Semantic Edge Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Honglak Lee,et al.  Object Contour Detection with a Fully Convolutional Encoder-Decoder Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bernt Schiele,et al.  Weakly Supervised Object Boundaries , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[20]  James M. Rehg,et al.  Unsupervised Learning of Edges , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[22]  Jianbo Shi,et al.  High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Klaus-Robert Müller,et al.  Explaining Predictions of Non-Linear Classifiers in NLP , 2016, Rep4NLP@ACL.

[25]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.