Deep Learning on Small Datasets without Pre-Training using Cosine Loss

Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well.In contrast to this, we show that the cosine loss function provides substantially better performance than crossentropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB- 200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on other popular datasets confirm our findings. Moreover, we demonstrate that integrating prior knowledge in the form of class hierarchies is straightforward with the cosine loss and improves classification performance further.

[1]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[2]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[3]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Gernot A. Fink,et al.  Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[6]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[7]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[12]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[17]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[18]  Miroslaw Bober,et al.  Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[25]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Joachim Denzler,et al.  Nonparametric Part Transfer for Fine-Grained Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Amaia Salvador,et al.  Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[33]  Joachim Denzler,et al.  The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Yongxin Yang,et al.  Frankenstein: Learning Deep Face Representations Using Small Data , 2016, IEEE Transactions on Image Processing.

[35]  Dong Liu,et al.  DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[37]  Xin Wang,et al.  TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Hong Yan,et al.  Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval , 2018, Pattern Recognit..

[39]  Joachim Denzler,et al.  Hierarchy-Based Image Embeddings for Semantic Image Retrieval , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[42]  Alexei A. Efros,et al.  Improving Generalization via Scalable Neighborhood Component Analysis , 2018, ECCV.

[43]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  Baoyuan Wu,et al.  Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning , 2019, IEEE Access.

[46]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[47]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[48]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[49]  Joachim Denzler,et al.  Deep Learning is not a Matter of Depth but of Good Training 1 , 2018 .

[50]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[52]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[53]  Tao Qin,et al.  Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..