Revisiting Pre-training: An Efficient Training Method for Image Classification.

The training method of repetitively feeding all samples into a pre-defined network for image classification has been widely adopted by current state-of-the-art. In this work, we provide a new method, which can be leveraged to train classification networks in a more efficient way. Starting with a warm-up step, we propose to continually repeat a Drop-and-Pick (DaP) learning strategy. In particular, we drop those easy samples to encourage the network to focus on studying hard ones. Meanwhile, by picking up all samples periodically during training, we aim to recall the memory of the networks to prevent catastrophic forgetting of previously learned knowledge. Our DaP learning method can recover 99.88%, 99.60%, 99.83% top-1 accuracy on ImageNet for ResNet-50, DenseNet-121, and MobileNet-V1 but only requires 75% computation in training compared to those using the classic training schedule. Furthermore, our pre-trained models are equipped with strong knowledge transferability when used for downstream tasks, especially for hard cases. Extensive experiments on object detection, instance segmentation and pose estimation can well demonstrate the effectiveness of our DaP training method.

[1]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[2]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[5]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[9]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jinjun Xiong,et al.  Revisiting RCNN: On Awakening the Classification Power of Faster RCNN , 2018, ECCV.

[13]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[14]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[16]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhiqiang Shen,et al.  Improving Object Detection from Scratch via Gated Feature Reuse , 2017, BMVC.

[18]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[19]  Jinjun Xiong,et al.  TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection , 2018, ECCV.

[20]  Jinjun Xiong,et al.  Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection , 2018, ArXiv.

[21]  Shuicheng Yan,et al.  Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids , 2017, ArXiv.

[22]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[25]  Yuanzhou Yang,et al.  Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[26]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[27]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[29]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[31]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[35]  Ngoc Thang Vu,et al.  Densely Connected Convolutional Networks for Speech Recognition , 2018, ITG Symposium on Speech Communication.