Learning multi-layer coarse-to-fine representations for large-scale image classification

Abstract Recent studies on large-scale image classification mainly focus on categorizing images into 1000 object classes, and all these 1000 object classes are atomic and mutually exclusive in the semantic space. However, for a much larger set of image categories (such as the ImageNet 10k dataset), some of them may come from the high-level (non-leaf) nodes of the concept ontology and could contain some other lower-level categories semantically. The research that classifies images into large numbers of image categories with such inter-category subsumption correlations has received rare attention. In this paper, a Visual-Semantic Tree is learned to organize 10k image categories hierarchically in a coarse-to-fine fashion, where both the inter-category visual similarities and inter-category semantic correlations are seamlessly integrated for tree construction. Additionally, a deep learning method is developed by integrating the Visual-Semantic Tree with deep CNNs to learn more discriminative tree classifiers for large-scale image classification. Our experimental results have demonstrated that the proposed Visual-Semantic Tree can effectively organize large-scale structural image categories and significantly boost the classification accuracy rates for both atomic image categories and high-level image categories.

[1]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Nanning Zheng,et al.  Constructing Deep Sparse Coding Network for image classification , 2017, Pattern Recognit..

[4]  Xiang Bai,et al.  Text/non-text image classification in the wild with convolutional neural networks , 2017, Pattern Recognit..

[5]  Jianping Fan,et al.  Hierarchical classification for automatic image annotation , 2007, SIGIR.

[6]  Meng Wang,et al.  Zero-Shot Learning via Attribute Regression and Class Prototype Rectification , 2018, IEEE Transactions on Image Processing.

[7]  Shuicheng Yan,et al.  LG-CNN: From local parts to global discrimination for fine-grained recognition , 2017, Pattern Recognit..

[8]  Yap-Peng Tan,et al.  Nonlinear dictionary learning with application to image classification , 2018, Pattern Recognit..

[9]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[11]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[12]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[13]  Jianping Fan,et al.  Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection , 2015, Pattern Recognit..

[14]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[15]  Ohad Shamir,et al.  Probabilistic Label Trees for Efficient Large Scale Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[19]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[21]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Eric O. Postma,et al.  Learning scale-variant and scale-invariant features for deep image classification , 2016, Pattern Recognit..

[25]  Yuxin Peng,et al.  Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[32]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[33]  Cordelia Schmid,et al.  Towards good practice in large-scale learning for image classification , 2012, CVPR.

[34]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[37]  Eduardo Fidalgo,et al.  Boosting image classification through semantic attention filtering strategies , 2018, Pattern Recognit. Lett..

[38]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[39]  Yue Gao,et al.  Zero-Shot Learning With Transferred Samples , 2017, IEEE Transactions on Image Processing.

[40]  Cécile Barat,et al.  String representations and distances in deep Convolutional Neural Networks for image classification , 2016, Pattern Recognit..

[41]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ke Chen,et al.  Convolutional Low-Resolution Fine-Grained Classification , 2017, Pattern Recognit. Lett..

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bin Zhao,et al.  Sparse Output Coding for Large-Scale Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Nazli Ikizler-Cinbis,et al.  Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-Shot Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  David A. Forsyth,et al.  Large multi-class image categorization with ensembles of label trees , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[48]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[50]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.