AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation

AutoAugment [4] has sparked an interest in automated augmentation methods for deep learning models. These methods estimate image transformation policies for train data that improve generalization to test data. While recent papers evolved in the direction of decreasing policy search complexity, we show that those methods are not robust when applied to biased and noisy data. To overcome these limitations, we reformulate AutoAugment as a generalized automated dataset optimization (AutoDO) task that minimizes the distribution shift between test data and distorted train dataset. In our AutoDO model, we explicitly estimate a set of per-point hyperparameters to flexibly change distribution of train data. In particular, we include hyperparameters for augmentation, loss weights, and softlabels that are jointly estimated using implicit differentiation. We develop a theoretical probabilistic interpretation of this framework using Fisher information and show that its complexity scales linearly with the dataset size. Our experiments on SVHN, CIFAR-10/100, and ImageNet classification show up to 9.3% improvement for biased datasets with label noise compared to prior methods and, importantly, up to 36.6% gain for underrepresented SVHN classes1.

[1]  Laurens van der Maaten,et al.  Learning Discriminative Fisher Kernels , 2011, ICML.

[2]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[3]  Naser Damer,et al.  A Comprehensive Study on Face Recognition Biases Beyond Demographics , 2021, IEEE Transactions on Technology and Society.

[4]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jonathan T. Barron,et al.  Continuously Differentiable Exponential Linear Units , 2017, ArXiv.

[7]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[9]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[10]  Qiang Wang,et al.  Adversarial AutoAugment , 2019, ICLR.

[11]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[12]  Edward R. Dougherty,et al.  Effect of separate sampling on classification accuracy , 2014, Bioinform..

[13]  Wei Wu,et al.  Online Hyper-Parameter Learning for Auto-Augmentation Strategy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Kun Yi,et al.  Probabilistic End-To-End Noise Correction for Learning With Noisy Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[16]  Ion Stoica,et al.  Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules , 2019, ICML.

[17]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[19]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[22]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[23]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[24]  Sotaro Tsukizawa,et al.  Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  Hideki Nakayama,et al.  Faster AutoAugment: Learning Augmentation Strategies using Backpropagation , 2019, ECCV.

[27]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[30]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[31]  D. Idczak A global implicit function theorem and its applications to functional equations , 2014 .

[32]  Pedro M. Domingos,et al.  Every Model Learned by Gradient Descent Is Approximately a Kernel Machine , 2020, ArXiv.

[33]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[34]  Dmytro Mishkin,et al.  Kornia: an Open Source Differentiable Computer Vision Library for PyTorch , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Timothy Hospedales,et al.  DADA: Differentiable Automatic Data Augmentation , 2020, ECCV 2020.

[36]  Taesup Kim,et al.  Fast AutoAugment , 2019, NeurIPS.

[37]  M. Galewski,et al.  On a global implicit function theorem for locally Lipschitz maps via non-smooth critical point theory , 2017, 1704.04280.

[38]  Edward R. Dougherty,et al.  Effect of separate sampling on classification and the minimax criterion , 2013, 2013 IEEE International Workshop on Genomic Signal Processing and Statistics.

[39]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[40]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[41]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[42]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Mario A. Nascimento,et al.  UniformAugment: A Search-free Probabilistic Data Augmentation Approach , 2020, ArXiv.

[44]  Gustavo Carneiro,et al.  A Bayesian Data Augmentation Approach for Learning Deep Models , 2017, NIPS.

[45]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[46]  M. Cristea On global implicit function theorem , 2017 .

[47]  Eric-Jan Wagenmakers,et al.  A Tutorial on Fisher Information , 2017, 1705.01064.