论文信息 - Learning fine-grained features via a CNN Tree for Large-scale Classification

Learning fine-grained features via a CNN Tree for Large-scale Classification

We propose a novel approach to enhance the discriminability of Convolutional Neural Networks (CNN). The key idea is to build a tree structure that could progressively learn fine-grained features to distinguish a subset of classes, by learning features only among these classes. Such features are expected to be more discriminative, compared to features learned for all the classes. We develop a new algorithm to effectively learn the tree structure from a large number of classes. Experiments on large-scale image classification tasks demonstrate that our method could boost the performance of a given basic CNN model. Our method is quite general, hence it can potentially be used in combination with many other deep learning models.

[1] Weilin Huang,et al. Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[2] Prashant J. Shenoy,et al. Sharing-aware algorithms for virtual machine colocation , 2011, SPAA '11.

[3] Pietro Perona,et al. Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[7] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[8] Sunita Sarawagi,et al. Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[9] Yi Li,et al. Robust Online Visual Tracking with a Single Convolutional Neural Network , 2014, ACCV.

[10] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[11] Andrew G. Howard,et al. Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.

[12] Gang Wang,et al. Integrating parametric and non-parametric models for scene labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[14] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16] LeCunYann,et al. Learning Hierarchical Features for Scene Labeling , 2013 .

[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Xiaogang Wang,et al. DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[21] Alexander C. Berg,et al. Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[22] Bin Fan,et al. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Jason Weston,et al. Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[24] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[27] Gang Wang,et al. Video tracking using learned hierarchical features. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[28] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[29] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[30] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[31] Sunita Sarawagi,et al. Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[35] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[37] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[39] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Christophe Garcia,et al. text Detection with Convolutional Neural Networks , 2008, VISAPP.

[41] Dumitru Erhan,et al. Scalable, High-Quality Object Detection , 2014, ArXiv.

[42] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[44] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[45] Seunghoon Hong,et al. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[46] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[47] Andrew Zisserman,et al. Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[48] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[49] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[51] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.