Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning

This paper proposes a learning strategy that embeds object-part concepts into a pre-trained convolutional neural network (CNN), in an attempt to 1) explore explicit semantics hidden in CNN units and 2) gradually transform the pre-trained CNN into a semantically interpretable graphical model for hierarchical object understanding. Given part annotations on very few (e.g., 3-12) objects, our method mines certain latent patterns from the pre-trained CNN and associates them with different semantic parts. We use a four-layer And-Or graph to organize the CNN units, so as to clarify their internal semantic hierarchy. Our method is guided by a small number of part annotations, and it achieves superior part-localization performance (about 13%-107% improvement in part center prediction on the PASCAL VOC and ImageNet datasets)

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yao Lu Unsupervised Learning on Neural Network Outputs , 2015, ArXiv.

[5]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Wenze Hu,et al.  Modeling Occlusion by Discriminative AND-OR Structures , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[9]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Mathieu Aubry,et al.  Understanding Deep Features with Computer-Generated Imagery , 2015, ICCV.

[11]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[12]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Song-Chun Zhu,et al.  Learning AND-OR Templates for Object Recognition and Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Zhuowen Tu,et al.  What Happened to My Dog in That Network: Unraveling Top-down Generators in Convolutional Neural Networks , 2015, ArXiv.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Yao Lu Unsupervised Learning on Neural Network Outputs: With Application in Zero-Shot Learning , 2016, IJCAI.

[18]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Pietro Perona,et al.  Strong supervision from weak annotation: Interactive training of deformable part models , 2011, 2011 International Conference on Computer Vision.

[20]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[22]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[23]  Joachim Denzler,et al.  Part Detector Discovery in Deep Convolutional Neural Networks , 2014, ACCV.

[24]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[25]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[26]  S. Tsogkas,et al.  Deep Learning for Semantic Part Segmentation with High-Level Guidance , 2015 .

[27]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[28]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[29]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[30]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Quanshi Zhang,et al.  Mining And-Or Graphs for Graph Matching and Object Discovery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[37]  Xuan Song,et al.  Object Discovery: Soft Attributed Graph Mining , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Xuan Song,et al.  Attributed Graph Mining and Matching: An Attempt to Define and Extract Soft Attributed Patterns , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.