Integrating multi-level deep learning and concept ontology for large-scale visual recognition

Abstract To support large-scale visual recognition (i.e., recognizing thousands or even tens of thousands of object classes), a multi-level deep learning algorithm is developed to learn multiple deep networks and a tree classifier jointly, where a concept ontology is constructed to organize large numbers of object classes hierarchically in a coarse-to-fine fashion and determine the inter-related learning tasks automatically. Our multi-level deep learning algorithm can: (a) train multiple deep networks simultaneously to achieve more discriminative representations of both coarse-grained groups and fine-grained object classes at different levels of the concept ontology (i.e., learning multiple sets of deep features simultaneously for different tasks); (b) leverage multi-task learning to train more discriminative classifiers for the fine-grained object classes in the same group to enhance their separability significantly and enable inter-class knowledge transferring; and (c) learn multiple deep networks and the tree classifier jointly in an end-to-end fashion. Our experimental results on three image sets have demonstrated that our multi-level deep learning algorithm can achieve very competitive results on both the accuracy rates and the computational efficiency for large-scale visual recognition.

[1]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[4]  Ioannis A. Kakadiaris,et al.  Hierarchical Multi-label Classification using Fully Associative Ensemble Learning , 2017, Pattern Recognit..

[5]  Gunnar Rätsch,et al.  Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation , 2011, NIPS.

[6]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[7]  Wei Wang,et al.  Multi-task deep neural network for multi-label learning , 2013, 2013 IEEE International Conference on Image Processing.

[8]  Ling Shao,et al.  Feature Learning for Image Classification Via Multiobjective Genetic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Thomas L. Griffiths,et al.  Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies , 2013, NIPS.

[10]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[11]  Tibério S. Caetano,et al.  Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies , 2013, International Journal of Computer Vision.

[12]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Nanning Zheng,et al.  Learning group-based dictionaries for discriminative image representation , 2014, Pattern Recognit..

[14]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Nanning Zheng,et al.  Constructing Deep Sparse Coding Network for image classification , 2017, Pattern Recognit..

[17]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[18]  Kristen Grauman,et al.  Learning a Tree of Metrics with Disjoint Visual Features , 2011, NIPS.

[19]  Jianping Fan,et al.  Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification , 2015, IEEE Transactions on Image Processing.

[20]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[22]  Jianping Fan,et al.  Deep Multi-task Learning for Large-Scale Image Classification , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[23]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[24]  Jianping Fan,et al.  Automatic image-text alignment for large-scale web image indexing and retrieval , 2015, Pattern Recognit..

[25]  Hongliang Fei,et al.  Structured feature selection and task relationship inference for multi-task learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[26]  Jianping Fan,et al.  Hierarchical learning of multi-task sparse metrics for large-scale image classification , 2017, Pattern Recognit..

[27]  David A. Forsyth,et al.  Large multi-class image categorization with ensembles of label trees , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jonathan Krause,et al.  Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[31]  Jianping Fan,et al.  Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection , 2015, Pattern Recognit..

[32]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[35]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[36]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Loris Nanni,et al.  Handcrafted vs. non-handcrafted features for computer vision classification , 2017, Pattern Recognit..

[38]  Yang Wang,et al.  Learning mid-level features from object hierarchy for image classification , 2014, IEEE Winter Conference on Applications of Computer Vision.

[39]  Xuelong Li,et al.  Texture Classification and Retrieval Using Shearlets and Linear Regression , 2015, IEEE Transactions on Cybernetics.

[40]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[41]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[42]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[43]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[44]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[45]  Haibin Ling,et al.  Diagnosing deep learning models for high accuracy age estimation from a single image , 2017, Pattern Recognit..

[46]  Jun Wang,et al.  Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification , 2014, ACM Multimedia.

[47]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[48]  Antoni B. Chan,et al.  Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Jianping Fan,et al.  HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition , 2017, IEEE Transactions on Image Processing.