Part-Aware Fine-Grained Object Categorization Using Weakly Supervised Part Detection Network

Fine-grained object categorization aims for distinguishing objects of subordinate categories that belong to the same entry-level object category. It is a rapidly developing subfield in multimedia content analysis. The task is challenging due to the facts that (1) training images with ground-truth labels are difficult to obtain, and (2) variations among different subordinate categories are subtle. It is well established that characterizing features of different subordinate categories are located on local parts of object instances. However, manually annotating object parts requires expertise, which is also difficult to generalize to new fine-grained categorization tasks. In this work, we propose a Weakly Supervised Part Detection Network (PartNet) that is able to detect discriminative local parts for the use of fine-grained categorization. A vanilla PartNet builds on top of a base subnetwork two parallel streams of upper network layers, which respectively compute scores of classification probabilities (over subordinate categories) and detection probabilities (over a specified number of discriminative part detectors) for local regions of interest (RoIs). The image-level prediction is obtained by aggregating element-wise products of these region-level probabilities, and meanwhile diverse part detectors can be learned in an end-to-end fashion under the image-level supervision. To generate a diverse set of RoIs as inputs of PartNet, we propose a simple Discretized Part Proposals module (DPP) that directly targets for proposing candidates of discriminative local parts, with no bridging via object-level proposals. Experiments on benchmark datasets of CUB-200-2011, Oxford Flower 102 and Oxford-IIIT Pet show the efficacy of our proposed method for both discriminative part detection and fine-grained categorization. In particular, we achieve the new state-of-the-art performance on CUB-200-2011 and Oxford-IIIT Pet datasets when ground-truth part annotations are not available.

[1]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[2]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[3]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Trevor Darrell,et al.  Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ahmed M. Elgammal,et al.  SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Qi Tian,et al.  Fine-Grained Image Search , 2015, IEEE Transactions on Multimedia.

[7]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[9]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Hai Jin,et al.  Landmark Classification With Hierarchical Multi-Modal Exemplar Feature , 2015, IEEE Transactions on Multimedia.

[11]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[13]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[15]  Dacheng Tao,et al.  Improving Training of Deep Neural Networks via Singular Value Bounding , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[17]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Songyang Lao,et al.  Bag of Surrogate Parts Feature for Visual Recognition , 2018, IEEE Transactions on Multimedia.

[22]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Zhihai He,et al.  Task-Driven Progressive Part Localization for Fine-Grained Object Recognition , 2016, IEEE Transactions on Multimedia.

[24]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qi Tian,et al.  Weak to Strong Detector Learning for Simultaneous Classification and Localization , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[28]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[29]  Michael Lam,et al.  Fine-Grained Recognition as HSnet Search for Informative Image Parts , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  In-So Kweon,et al.  Multi-scale pyramid pooling for deep convolutional representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  利久 亀井,et al.  California Institute of Technology , 1958, Nature.

[34]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[35]  Yang Gao,et al.  Fine-grained pose prediction, normalization, and recognition , 2015, ArXiv.

[36]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[38]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Xiao Liu,et al.  Fully Convolutional Attention Networks for Fine-Grained Recognition , 2016 .

[42]  Zhichao Li,et al.  Dynamic Computational Time for Visual Attention , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[43]  Manohar Paluri,et al.  Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[44]  Hongliang Li,et al.  PBC: Polygon-Based Classifier for Fine-Grained Categorization , 2017, IEEE Transactions on Multimedia.

[45]  Daniel Tretter,et al.  Personal Clothing Retrieval on Photo Collections by Color and Attributes , 2013, IEEE Transactions on Multimedia.

[46]  Jianfei Cai,et al.  Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[47]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[48]  Feng Zhou,et al.  Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[50]  Dacheng Tao,et al.  Real Time Fine-Grained Categorization with Accuracy and Interpretability , 2016, ArXiv.

[51]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yongdong Zhang,et al.  Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[53]  Weiyao Lin,et al.  Picking Neural Activations for Fine-Grained Recognition , 2017, IEEE Transactions on Multimedia.

[54]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.

[55]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.