论文信息 - Learning multi-layer coarse-to-fine representations for large-scale image classification

Learning multi-layer coarse-to-fine representations for large-scale image classification

Abstract Recent studies on large-scale image classification mainly focus on categorizing images into 1000 object classes, and all these 1000 object classes are atomic and mutually exclusive in the semantic space. However, for a much larger set of image categories (such as the ImageNet 10k dataset), some of them may come from the high-level (non-leaf) nodes of the concept ontology and could contain some other lower-level categories semantically. The research that classifies images into large numbers of image categories with such inter-category subsumption correlations has received rare attention. In this paper, a Visual-Semantic Tree is learned to organize 10k image categories hierarchically in a coarse-to-fine fashion, where both the inter-category visual similarities and inter-category semantic correlations are seamlessly integrated for tree construction. Additionally, a deep learning method is developed by integrating the Visual-Semantic Tree with deep CNNs to learn more discriminative tree classifiers for large-scale image classification. Our experimental results have demonstrated that the proposed Visual-Semantic Tree can effectively organize large-scale structural image categories and significantly boost the classification accuracy rates for both atomic image categories and high-level image categories.

[1] Cordelia Schmid,et al. Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3] Nanning Zheng,et al. Constructing Deep Sparse Coding Network for image classification , 2017, Pattern Recognit..

[4] Xiang Bai,et al. Text/non-text image classification in the wild with convolutional neural networks , 2017, Pattern Recognit..

[5] Jianping Fan,et al. Hierarchical classification for automatic image annotation , 2007, SIGIR.

[6] Meng Wang,et al. Zero-Shot Learning via Attribute Regression and Class Prototype Rectification , 2018, IEEE Transactions on Image Processing.

[7] Shuicheng Yan,et al. LG-CNN: From local parts to global discrimination for fine-grained recognition , 2017, Pattern Recognit..

[8] Yap-Peng Tan,et al. Nonlinear dictionary learning with application to image classification , 2018, Pattern Recognit..

[9] Jonathon Shlens,et al. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Daphne Koller,et al. Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[11] Alexander C. Berg,et al. Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[12] Charles A. Micchelli,et al. Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[13] Jianping Fan,et al. Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection , 2015, Pattern Recognit..

[14] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[15] Ohad Shamir,et al. Probabilistic Label Trees for Efficient Large Scale Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Jitendra Malik,et al. Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Jason Weston,et al. Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[19] Pietro Perona,et al. Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Samy Bengio,et al. Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[21] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[22] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[23] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24] Eric O. Postma,et al. Learning scale-variant and scale-invariant features for deep image classification , 2016, Pattern Recognit..

[25] Yuxin Peng,et al. Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Antonio Torralba,et al. Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[27] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Alexei A. Efros,et al. Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Jianping Fan,et al. Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[32] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[33] Cordelia Schmid,et al. Towards good practice in large-scale learning for image classification , 2012, CVPR.

[34] Joshua B. Tenenbaum,et al. Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Pietro Perona,et al. Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36] I. Biederman. Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[37] Eduardo Fidalgo,et al. Boosting image classification through semantic attention filtering strategies , 2018, Pattern Recognit. Lett..

[38] Ian Davidson,et al. Flexible constrained spectral clustering , 2010, KDD.

[39] Yue Gao,et al. Zero-Shot Learning With Transferred Samples , 2017, IEEE Transactions on Image Processing.

[40] Cécile Barat,et al. String representations and distances in deep Convolutional Neural Networks for image classification , 2016, Pattern Recognit..

[41] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Ke Chen,et al. Convolutional Low-Resolution Fine-Grained Classification , 2017, Pattern Recognit. Lett..

[43] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Bin Zhao,et al. Sparse Output Coding for Large-Scale Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46] Nazli Ikizler-Cinbis,et al. Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-Shot Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47] David A. Forsyth,et al. Large multi-class image categorization with ensembles of label trees , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[48] Cordelia Schmid,et al. Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[50] Ming Yang,et al. Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.