Learning Augmentation Network via Influence Functions

Data augmentation can impact the generalization performance of an image classification model in a significant way. However, it is currently conducted on the basis of trial and error, and its impact on the generalization performance cannot be predicted during training. This paper considers an influence function that predicts how generalization performance, in terms of validation loss, is affected by a particular augmented training sample. The influence function provides an approximation of the change in validation loss without actually comparing the performances that include and exclude the sample in the training process. Based on this function, a differentiable augmentation network is learned to augment an input training sample to reduce validation loss. The augmented sample is fed into the classification network, and its influence is approximated as a function of the parameters of the last fully-connected layer of the classification network. By backpropagating the influence to the augmentation network, the augmentation network parameters are learned. Experimental results on CIFAR-10, CIFAR-100, and ImageNet show that the proposed method provides better generalization performance than conventional data augmentation methods do.

[1]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[3]  Ian S. Fischer,et al.  Learning to Attack: Adversarial Transformation Networks , 2018, AAAI.

[4]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Fei Yang,et al.  Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  S. Weisberg,et al.  Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression , 1980 .

[8]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yoshua Bengio,et al.  Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  W. Fung,et al.  Generalized Leverage and its Applications , 1998 .

[13]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[14]  Naman Agarwal,et al.  Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.

[15]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[18]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[19]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Leon Sixt,et al.  RenderGAN: Generating Realistic Labeled Data , 2016, Front. Robot. AI.

[21]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[22]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[24]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[25]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Xavier Gastaldi,et al.  Shake-Shake regularization , 2017, ArXiv.

[28]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[29]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[30]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  R. Cook,et al.  Assessing influence on predictions from generalized linear models , 1990 .

[32]  R. Cook Assessment of Local Influence , 1986 .

[33]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Peter Corcoran,et al.  Smart Augmentation Learning an Optimal Data Augmentation Strategy , 2017, IEEE Access.

[36]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[37]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[39]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[40]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[41]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[43]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[45]  Xiaodong Cui,et al.  Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[46]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[49]  Chunhua Shen,et al.  Adversarial Generation of Training Examples for Vehicle License Plate Recognition , 2017, ArXiv.