Joint Supervision for Discriminative Feature Learning in Convolutional Neural Networks

Convolutional Neural Networks have achieved excellent results in various tasks such as face verification and image classification. As a typical loss function in CNNs, the softmax loss is widely used as the supervision signal to train the model for multi-class classification, which can force the learned features to be separable. Unfortunately, these learned features aren’t discriminative enough. In order to efficiently encourage intra-class compactness and inter-class separability of learned features, this paper proposes a H-contrastive loss based on contrastive loss for multi-class classification tasks. Jointly supervised by softmax loss, H-contrastive loss and center loss, we can train a robust CNN to enhance the discriminative power of the deeply learned features from different classes. It is encouraging to see that through our joint supervision, the results achieve the state-of-the-art accuracy on several multi-class classification datasets such as MNIST, CIFAR-10 and CIFAR-100.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[4]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[9]  Xiaogang Wang,et al.  Sparsifying Neural Network Connections for Face Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[11]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[12]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[16]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.