Deep super-class learning for long-tail distributed image classification

Abstract Long-tail distribution is widespread in many practical applications, where most categories contain only a small number of samples. As sufficient instances cannot be obtained for describing the intra-class diversity of the minority classes, the separating hyperplanes learned by traditional machine learning methods are usually heavily skewed. Resampling techniques and cost-sensitive algorithms have been introduced to enhance the statistical power of the minority classes, but they cannot infer more reliable class boundaries beyond the description of samples in the training set. To address this issue, we cluster the original categories into super-class to produce a relatively balanced distribution in the super-class space. Moreover, the knowledge shared among categories belonging to a certain super-class can facilitate the generalization of the minority classes. However, existing super-class construction methods have some inherent disadvantages. Specifically, taxonomy-based methods suffer a gap between the semantic space and the feature space, and the performance of learning-based algorithms strongly depends on the features and data distribution. In this paper, we propose a deep super-class learning (DSCL) model to tackle the problem of long-tail distributed image classification. Motivated by the observation that classes belonging to the same super-class usually have more similar evaluations on the features than those belonging to different super-classes, we design a block-structured sparse constraint and attach it on the top of a convolutional neural network. Thus, the proposed DSCL model can accomplish representation learning, classifier training, and super-class construction in a unified end-to-end learning procedure. We compared the proposed model with several super-class construction methods on two public image datasets. Experimental results show that the super-class construction strategy is effective for the long-tail distributed classification, and the DSCL model can achieve better results than the other methods.

[1]  Silvio Savarese,et al.  Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Xiaogang Wang,et al.  Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Eric P. Xing,et al.  Large-scale Distributed Dependent Nonparametric Trees , 2015, ICML.

[4]  Dacheng Tao,et al.  Shakeout: A New Regularized Deep Neural Network Training Scheme , 2016, AAAI.

[5]  Pengtao Xie,et al.  Diversifying Restricted Boltzmann Machine for Document Modeling , 2015, KDD.

[6]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[7]  Vinod Nair,et al.  Learning hierarchical similarity metrics , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[9]  Ohad Shamir,et al.  Probabilistic Label Trees for Efficient Large Scale Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Li Lin,et al.  Joint Hierarchical Category Structure Learning and Large-Scale Image Classification , 2017, IEEE Transactions on Image Processing.

[11]  Arun Ross,et al.  On automated source selection for transfer learning in convolutional neural networks , 2018, Pattern Recognit..

[12]  M. de Rijke,et al.  Document Filtering for Long-tail Entities , 2016, CIKM.

[13]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[14]  Lijun Xie,et al.  A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data , 2018, Pattern Recognit..

[15]  Nanning Zheng,et al.  Learning group-based dictionaries for discriminative image representation , 2014, Pattern Recognit..

[16]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[17]  Jane Yung-jen Hsu,et al.  Who likes it more?: mining worth-recommending items from long tails by modeling relative preference , 2014, WSDM.

[18]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[19]  Jianping Fan,et al.  Integrating multi-level deep learning and concept ontology for large-scale visual recognition , 2018, Pattern Recognit..

[20]  Christopher D. Manning Understanding Human Language: Can NLP and Deep Learning Help? , 2016, SIGIR.

[21]  Maoguo Gong,et al.  Multi-objective optimization for long tail recommendation , 2016, Knowl. Based Syst..

[22]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[23]  Jianping Fan,et al.  Hierarchical learning of multi-task sparse metrics for large-scale image classification , 2017, Pattern Recognit..

[24]  Dragomir Anguelov,et al.  Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Joshua B. Tenenbaum,et al.  Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[27]  Jiebo Luo,et al.  Human Facial Age Estimation by Cost-Sensitive Label Ranking and Trace Norm Regularization , 2017, IEEE Transactions on Multimedia.

[28]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  Jianping Fan,et al.  Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Qinghua Hu,et al.  Hierarchical support vector machine based structural classification with fused hierarchies , 2016, Neurocomputing.

[32]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[33]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Yejin Choi,et al.  From Large Scale Image Categorization to Entry-Level Categories , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[36]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[37]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[38]  Dong Yu,et al.  Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Nanning Zheng,et al.  Training inter-related classifiers for automatic image classification and annotation , 2013, Pattern Recognit..

[42]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Charless C. Fowlkes,et al.  Do We Need More Training Data? , 2015, International Journal of Computer Vision.

[45]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[46]  Qinghua Hu,et al.  Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO , 2015, IEEE Transactions on Multimedia.

[47]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[48]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[49]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[50]  Xiaoqiang Lu,et al.  Scene Recognition by Manifold Regularized Deep Learning Architecture , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[52]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Nathalie Japkowicz,et al.  Supervised Versus Unsupervised Binary-Learning by Feedforward Neural Networks , 2004, Machine Learning.