A Kernel Theory of Modern Data Augmentation

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in invariant kernels, tangent propagation, and robust optimization. Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelerating machine learning workflows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training.

[1]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[2]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[3]  Todd K. Leen,et al.  From Data Distributions to Regularization in Invariant Learning , 1995, Neural Computation.

[4]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[5]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[6]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[9]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[10]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[11]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[12]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[13]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[14]  Bin Zheng,et al.  Research Paper: Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation , 2006, J. Am. Medical Informatics Assoc..

[15]  Alexander J. Smola,et al.  Convex Learning with Invariances , 2007, NIPS.

[16]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[17]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18]  Richard H. Moore,et al.  THE DIGITAL DATABASE FOR SCREENING MAMMOGRAPHY , 2007 .

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[21]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[22]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[23]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[24]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition - Tangent Distance and Tangent Propagation , 2012, Neural Networks: Tricks of the Trade.

[25]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[26]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[27]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Tomaso A. Poggio,et al.  Learning with Group Invariant Features: A Kernel Perspective , 2015, NIPS.

[30]  James Bailey,et al.  Invariant backpropagation: how to train a transformation-invariant neural network , 2015, ArXiv.

[31]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[32]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[33]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Pascal Frossard,et al.  Adaptive data augmentation for image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[35]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[36]  Tri Dao,et al.  Gaussian Quadrature for Kernel Features , 2017, NIPS.

[37]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[38]  Bernhard Schölkopf,et al.  Local Group Invariant Representations via Orbit Embeddings , 2016, AISTATS.

[39]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[40]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[41]  Franck Giron,et al.  Improving music source separation based on deep neural networks through data augmentation and network blending , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Marios Savvides,et al.  Max-Margin Invariant Features from Transformed Unlabelled Data , 2017, NIPS.

[43]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[44]  Fang Zhao,et al.  Marginalized CNN: Learning Deep Invariant Representations , 2017, BMVC.

[45]  James Hensman,et al.  Learning Invariances using the Marginal Likelihood , 2018, NeurIPS.

[46]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[47]  Virginia Smith,et al.  Efficient Augmentation via Data Subsampling , 2018, ICLR.

[48]  Peter Bailis,et al.  Equivariant Transformer Networks , 2019, ICML.

[49]  Xin Gao,et al.  Deep learning–based fully automated detection and segmentation of lymph nodes on multiparametric-mri for rectal cancer: A multicentre study , 2020, EBioMedicine.