Explore Visual Concept Formation for Image Classification

Human beings acquire the ability of image classification through visual concept learning, in which the process of concept formation involves intertwined searches of common properties and concept descriptions. However, in most image classification algorithms using deep convolutional neural network (ConvNet), the representation space is constructed under the premise that concept descriptions are fixed as one-hot codes, which limits the mining of properties and the ability of identifying unseen samples. Inspired by this, we propose a learning strategy of visual concept formation (LSOVCF) based on the ConvNet, in which the two intertwined parts of concept formation, i.e. feature extraction and concept description, are learned together. First, LSOVCF takes sample response in the last layer of ConvNet to induct concept description being assumed as Gaussian distribution, which is part of the training process. Second, the exploration and experience loss is designed for optimization, which adopts experience cache pool to speed up convergence. Experiments show that LSOVCF improves the ability of identifying unseen samples on cifar10, STL10, flower17 and ImageNet based on several backbones, from the classic VGG to the SOTA Ghostnet. The code is available at https: //github.com/elvintanhust/LSOVCF.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Yaohua Wang,et al.  Hierarchical Clustering With Hard-Batch Triplet Loss for Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jufeng Yang,et al.  Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network , 2017, IJCAI.

[4]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jie Liang,et al.  Joint Acne Image Grading and Counting via Label Distribution Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[7]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[10]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Bernard Ghanem,et al.  SGAS: Sequential Greedy Architecture Search , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[13]  Andrew Zisserman,et al.  Automatically Discovering and Learning New Visual Categories with Ranking Statistics , 2020, ICLR.

[14]  Chia-Wen Lin,et al.  CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data , 2017, IEEE Transactions on Multimedia.

[15]  Bo Xu,et al.  Concept Learning through Deep Reinforcement Learning with Memory-Augmented Neural Networks , 2018, Neural Networks.

[16]  Chen Change Loy,et al.  Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cheng Deng,et al.  Multi-Scale Fusion Subspace Clustering Using Similarity Constraint , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[23]  Kyungwoo Song,et al.  Hierarchically Clustered Representation Learning , 2019, AAAI.

[24]  Wei Wang,et al.  One-Shot Image Classification by Learning to Restore Prototypes , 2020, AAAI.

[25]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[27]  Po-Sen Huang,et al.  Towards Robust Image Classification Using Sequential Attention Models , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[29]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[30]  Dacheng Tao,et al.  Learning Unseen Concepts via Hierarchical Decomposition and Composition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Ildoo Kim,et al.  Spatially Attentive Output Layer for Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jianyuan Guo,et al.  GhostNet: More Features From Cheap Operations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Junmo Kim,et al.  Active Convolution: Learning the Shape of Convolution for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Thomas L. Griffiths,et al.  Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies , 2013, NIPS.

[40]  Anne Jumonville,et al.  Encyclopedia of the Sciences of Learning , 2013 .

[41]  Takumi Kobayashi,et al.  Global Feature Guided Local Pooling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).