How Important is Importance Sampling for Deep Budgeted Training?

Long iterative training processes for Deep Neural Networks (DNNs) are commonly required to achieve state-of-the-art performance in many computer vision tasks. Importance sampling approaches might play a key role in budgeted training regimes, i.e. when limiting the number of training iterations. These approaches aim at dynamically estimating the importance of each sample to focus on the most relevant and speed up convergence. This work explores this paradigm and how a budget constraint interacts with importance sampling approaches and data augmentation techniques. We show that under budget restrictions, importance sampling approaches do not provide a consistent improvement over uniform sampling. We suggest that, given a specific budget, the best course of action is to disregard the importance and introduce adequate data augmentation; e.g. when reducing the budget to a 30% in CIFAR-10/100, RICAP data augmentation maintains accuracy, while importance sampling does not. We conclude from our work that DNNs under budget restrictions benefit greatly from variety in the training set and that finding the right samples to train on is not the most effective strategy when balancing high performance with low computational requirements. Source code available at: https://git.io/JKHa3

[1]  Siddharth Gopal,et al.  Adaptive Sampling for SGD by Exploiting Side Information , 2016, ICML.

[2]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[5]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[6]  Xu Jia,et al.  Learning to Select Base Classes for Few-Shot Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hao Cheng,et al.  Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tao Mei,et al.  Exploring Visual Relationship for Image Captioning , 2018, ECCV.

[10]  Guergana K. Savova,et al.  Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks , 2017, EMNLP.

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[13]  Zhihui Li,et al.  A Survey of Deep Active Learning , 2020, ACM Comput. Surv..

[14]  Noel E. O'Connor,et al.  Unsupervised label noise modeling and loss correction , 2019, ICML.

[15]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[16]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[17]  Yoshua Bengio,et al.  An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.

[18]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[19]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[20]  Takashi Matsubara,et al.  RICAP: Random Image Cropping and Patching Data Augmentation for Deep CNNs , 2018, ACML.

[21]  Daphna Weinshall,et al.  On The Power of Curriculum Learning in Training Deep Networks , 2019, ICML.

[22]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[23]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Venkatesh Saligrama,et al.  Adaptive Classification for Prediction Under a Budget , 2017, NIPS.

[25]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[26]  Baharan Mirzasoleiman,et al.  Coresets for Data-efficient Training of Machine Learning Models , 2019, ICML.

[27]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[28]  George Ioannou,et al.  Improving the Convergence Speed of Deep Neural Networks with Biased Sampling , 2019, ICAAI 2019.

[29]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Majid Sarrafzadeh,et al.  Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams , 2019, ICLR.

[31]  Baharan Mirzasoleiman,et al.  Selection Via Proxy: Efficient Data Selection For Deep Learning , 2019, ICLR.

[32]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[34]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[35]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[37]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bao-Gang Hu,et al.  Learning with Average Top-k Loss , 2017, NIPS.

[39]  Ersin Yumer,et al.  Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints , 2019, ICLR.

[40]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[41]  Noel E. O'Connor,et al.  ReLaB: Reliable Label Bootstrapping for Semi-Supervised Learning , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).

[42]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Haihao Lu,et al.  Randomized Gradient Boosting Machine , 2018, SIAM J. Optim..

[44]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Inderjit S. Dhillon,et al.  AutoAssist: A Framework to Accelerate Training of Deep Neural Networks , 2019, NeurIPS.

[47]  Jeffrey Dean,et al.  Accelerating Deep Learning by Focusing on the Biggest Losers , 2019, ArXiv.

[48]  Haihao Lu,et al.  Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization , 2020, AISTATS.

[49]  Noel E. O'Connor,et al.  Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[50]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[51]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[53]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Yoshua Bengio,et al.  Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.

[55]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[57]  Cheng Zhang,et al.  Active Mini-Batch Sampling using Repulsive Point Processes , 2018, AAAI.

[58]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[59]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[60]  Tyler B. Johnson,et al.  Training Deep Models Faster with Robust, Approximate Importance Sampling , 2018, NeurIPS.

[61]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  François Fleuret,et al.  Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[63]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).