Occlusion localization based on convolutional neural networks

In most convolutional neural networks (CNNs), the output is a single classification result by combining all the neuron activations in the last layer. As we know, local connectivity is an important characteristic of CNNs. Each neuron in the network corresponds to a local region in the original image. Hence, it is possible to simultaneously obtain local visibility of a target object by analyzing neuron activations in a vanilla network. In this paper, a method to localize partial occlusions based on an off-the-shelf CNN is proposed. Unlike most existing foreground segmentation methods, it should be noted that both classification results and foreground estimation are simultaneously obtained with no deliberate foreground annotations and no extra network designs in this paper. The contributions of the paper are twofold: First, a method to obtain occlusion maps within regions of interest is developed based on a vanilla object classification network. Second, several strategies to infer occlusion maps based on the neuron activations are developed and tested. Preliminary results on both synthetic and GTSRB traffic signs show the potential of the developed methods to infer local occlusions based on an off-the-shelf CNN.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Baoli Li,et al.  Traffic-Sign Detection and Classification in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[4]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[7]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[8]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[9]  Xiaoli Hao,et al.  A Cognitively Motivated Method for Classification of Occluded Traffic Signs , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.