Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition

We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale “head” to a fine-scale “eye” when recognizing bird species), we propose a novel progressive-attention convolutional neural network (PA-CNN) to progressively localize parts at multiple scales. The PA-CNN localizes parts in two steps, where a part proposal network (PPN) generates multiple local attention maps, and a part rectification network (PRN) learns part-specific features from each proposal and provides the PPN with refined part locations. This coupling of the PPN and PRN allows them to be optimized in a mutually reinforcing manner, leading to improved pinpointing of fine-grained parts. Moreover, the convolutional parameters for a PPN at a finer scale can be inherited from the PRN at a coarser scale, enabling a rich part hierarchy (e.g., eye and beak in a bird’s head) to be learned in a stacked fashion. Case studies show that PA-CNN can precisely identify parts without using bounding box/part annotations. In addition, quantitative evaluations demonstrate that PA-CNN yields state-of-the-art performance in three challenging fine-grained recognition tasks. i.e., CUB-2000–2011, FGVC-Aircraft, and Stanford Cars.

[1]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[3]  C. V. Jawahar,et al.  The truth about cats and dogs , 2011, 2011 International Conference on Computer Vision.

[4]  Qilong Wang,et al.  Is Second-Order Information Helpful for Large-Scale Visual Recognition? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Iasonas Kokkinos,et al.  Segmentation-Aware Convolutional Networks Using Local Attention Masks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[7]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[8]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Yuxin Peng,et al.  Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN , 2017, ACM Multimedia.

[12]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[13]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[16]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[20]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[21]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael Lam,et al.  Fine-Grained Recognition as HSnet Search for Informative Image Parts , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Naila Murray,et al.  Revisiting the Fisher vector for fine-grained classification , 2014, Pattern Recognit. Lett..

[24]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[27]  Xiao Liu,et al.  Fully Convolutional Attention Networks for Fine-Grained Recognition , 2016 .

[28]  Joachim Denzler,et al.  Generalized Orderless Pooling Performs Implicit Salient Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ahmed M. Elgammal,et al.  SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhiqiang Shen,et al.  Multiple Granularity Descriptors for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[36]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jonghyun Choi,et al.  Mining Discriminative Triplets of Patches for Fine-Grained Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[43]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yuxin Peng,et al.  StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization , 2018, IJCAI.

[45]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[49]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.

[50]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[51]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).