Ontology-driven hierarchical sparse coding for large-scale image classification

Abstract An ontology-driven hierarchical sparse representation is developed in this paper, which aims to support hierarchical learning for large scale image classification. Firstly, a two-layer ontology (semantic ontology and visual ontology) is built to organize large number of image classes hierarchically, where WordNet is used to construct semantic ontology and deep features extracted by Inception V3 are used to construct visual ontology (visual tree). Secondly, a novel algorithm based on Split Bregman Iteration is developed to learn hierarchical sparse representation, i.e., learning a shared dictionary and a set of class-specified dictionaries depending on the two-layer ontology. For multi-class image classification, a tree classifier is trained according to the two-layer ontology by using the hierarchical sparse representation. Thirdly, for a given test image, multiple paths are simultaneously evaluated to achieve optimal prediction of its class label. Our proposed approach has been evaluated over three benchmark datasets: ILSVRC2010, SUN397, and Caltech256, and the experimental results have demonstrated that our approach is better than the original joint dictionary learning method and can achieve better accuracy compared with other approaches which use handcrafted features.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[4]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Towards good practice in large-scale learning for image classification , 2012, CVPR.

[6]  Xiu-Shen Wei,et al.  Deep Spatial Pyramid: The Devil is Once Again in the Details , 2015, ArXiv.

[7]  Nanning Zheng,et al.  Learning group-based dictionaries for discriminative image representation , 2014, Pattern Recognit..

[8]  Jianping Fan,et al.  Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification , 2015, IEEE Transactions on Image Processing.

[9]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[10]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[11]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jianping Fan,et al.  Hierarchical learning of large-margin metrics for large-scale image classification , 2016, Neurocomputing.

[13]  Shaogang Gong,et al.  Towards Open-World Person Re-Identification by One-Shot Group-Based Verification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[15]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[18]  Nanning Zheng,et al.  Training inter-related classifiers for automatic image classification and annotation , 2013, Pattern Recognit..

[19]  Jianping Fan,et al.  Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Shuicheng Yan,et al.  Hybrid CNN and Dictionary-Based Models for Scene Recognition and Domain Adaptation , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Xinbo Gao,et al.  Sparse Graphical Representation based Discriminant Analysis for Heterogeneous Face Recognition , 2016, Signal Process..

[22]  Jianping Fan,et al.  Hierarchical learning of multi-task sparse metrics for large-scale image classification , 2017, Pattern Recognit..

[23]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[24]  Kristen Grauman,et al.  Learning a Tree of Metrics with Disjoint Visual Features , 2011, NIPS.

[25]  Lei Zhang,et al.  Projective dictionary pair learning for pattern classification , 2014, NIPS.

[26]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[27]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[28]  Ohad Shamir,et al.  Probabilistic Label Trees for Efficient Large Scale Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Li Lin,et al.  Joint Hierarchical Category Structure Learning and Large-Scale Image Classification , 2017, IEEE Transactions on Image Processing.

[30]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[31]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[32]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[33]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[34]  Qingming Huang,et al.  Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification , 2015, IEEE Transactions on Image Processing.

[35]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[36]  Simon C. K. Shiu,et al.  High-Order Local Pooling and Encoding Gaussians Over a Dictionary of Gaussians , 2017, IEEE Transactions on Image Processing.

[37]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.