论文信息 - Hyper-class augmented and regularized deep learning for fine-grained image classification

Hyper-class augmented and regularized deep learning for fine-grained image classification

Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recognition. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more challenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class variance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strategy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the challenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled images from readily available external sources (e.g., image search engines), and formulating the problem into multitask learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.

[1] Christopher Kanan,et al. Fine-grained object recognition with Gnostic Fields , 2014, IEEE Winter Conference on Applications of Computer Vision.

[2] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[3] Nitish Srivastava,et al. Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[4] Yang Wang,et al. A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[5] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[6] Ramakant Nevatia,et al. Robust multi-view car detection using unsupervised sub-categorization , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[7] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Jonathan Krause,et al. Learning Features and Parts for Fine-Grained Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[10] Fei-Fei Li,et al. Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[11] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[12] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[14] Kun Duan,et al. Attribute-based vehicle recognition using viewpoint-aware multiple instance SVMs , 2014, IEEE Winter Conference on Applications of Computer Vision.

[15] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Antoni B. Chan,et al. Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Trevor Darrell,et al. PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Arnold W. M. Smeulders,et al. Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[22] Linda G. Shapiro,et al. Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[23] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[24] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[29] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[31] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[32] Qiang Chen,et al. Network In Network , 2013, ICLR.

[33] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Larry S. Davis,et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[35] Jonathan Krause,et al. Collecting a Large-scale Dataset of Fine-grained Cars , 2013 .

[36] Jieping Ye,et al. A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[37] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[38] Forrest N. Iandola,et al. Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[39] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Samy Bengio,et al. Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[42] Arnold W. M. Smeulders,et al. Local Alignments for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[43] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[45] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[46] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[47] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .