SalfMix: A Novel Single Image-Based Data Augmentation Technique Using a Saliency Map

Modern data augmentation strategies such as Cutout, Mixup, and CutMix, have achieved good performance in image recognition tasks. Particularly, the data augmentation approaches, such as Mixup and CutMix, that mix two images to generate a mixed training image, could generalize convolutional neural networks better than single image-based data augmentation approaches such as Cutout. We focus on the fact that the mixed image can improve generalization ability, and we wondered if it would be effective to apply it to a single image. Consequently, we propose a new data augmentation method to produce a self-mixed image based on a saliency map, called SalfMix. Furthermore, we combined SalfMix with state-of-the-art two images-based approaches, such as Mixup, SaliencyMix, and CutMix, to increase the performance, called HybridMix. The proposed SalfMix achieved better accuracies than Cutout, and HybridMix achieved state-of-the-art performance on three classification datasets: CIFAR-10, CIFAR-100, and TinyImageNet-200. Furthermore, HybridMix achieved the best accuracy in object detection tasks on the VOC dataset, in terms of mean average precision.

[1]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[2]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kaiming He,et al.  Exploring Randomly Wired Neural Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[5]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[7]  Huchuan Lu,et al.  Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jie Qin,et al.  ResizeMix: Mixing Data with Preserved Object Information and True Labels , 2020, ArXiv.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[16]  Hyuk-Jae Lee,et al.  Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Dacheng Tao,et al.  SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data , 2020, AAAI.

[18]  Seong-Whan Lee,et al.  Self-Augmentation: Generalizing Deep Networks to Unseen Classes for Few-Shot Learning , 2020, ArXiv.

[19]  Domonkos Varga,et al.  Multi-pooled Inception features for no-reference image quality assessment , 2020, Applied Sciences.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Thomas L. Griffiths,et al.  Human Uncertainty Makes Classification More Robust , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  TaeChoong Chung,et al.  SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization , 2020, ICLR.

[26]  Larry S. Davis,et al.  AutoFocus: Efficient Multi-Scale Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Yong Jae Lee,et al.  Real-time Instance Segmentation , 2019 .

[28]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Sk Md Obaidullah,et al.  Classification of Geometric Forms in Mosaics Using Deep Neural Network , 2021, J. Imaging.

[32]  Haichao Zhang,et al.  Towards Adversarially Robust Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Kaiming He,et al.  Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35]  Alvaro Soto,et al.  Human detection using a mobile platform and novel features derived from a visual saliency mechanism , 2010, Image Vis. Comput..

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[37]  Hyun Oh Song,et al.  Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup , 2020, ICML.