Topologically Densified Distributions

We study regularization in the context of small sample-size learning with over-parameterized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constraints in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.

[1]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[5]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[6]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[7]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[8]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[9]  Ruslan Salakhutdinov,et al.  On Characterizing the Capacity of Neural Networks using Algebraic Topology , 2018, ArXiv.

[10]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[11]  Tri Dao,et al.  A Kernel Theory of Modern Data Augmentation , 2018, ICML.

[12]  Chao Chen,et al.  A Topological Regularizer for Classifiers via Persistent Homology , 2019, AISTATS.

[13]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[14]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[15]  J. Meiss,et al.  Computational topology at multiple resolutions: foundations and applications to fractals and dynamics , 2000 .

[16]  Taejong Joo,et al.  Regularizing activations in neural networks via distribution matching with the Wasserstein metric , 2020, ICLR.

[17]  Twan van Laarhoven,et al.  L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.

[18]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[19]  Wonjong Rhee,et al.  Utilizing Class Information for Deep Network Representation Shaping , 2018, AAAI.

[20]  Marc Niethammer,et al.  Connectivity-Optimized Representation Learning via Persistent Homology , 2019, ICML.

[21]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[22]  Lior Wolf,et al.  Regularizing by the Variance of the Activations' Sample-Variances , 2018, NeurIPS.

[23]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[24]  Torsten Hoefler,et al.  Augment your batch: better training with larger batches , 2019, ArXiv.

[25]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[26]  Judy Hoffman,et al.  Robust Learning with Jacobian Regularization , 2019, ArXiv.

[27]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[28]  Guodong Zhang,et al.  Three Mechanisms of Weight Decay Regularization , 2018, ICLR.

[29]  Karsten M. Borgwardt,et al.  Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology , 2018, ICLR.

[30]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[31]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[32]  Nicolas Le Roux,et al.  Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[33]  Renjie Liao,et al.  Learning Deep Parsimonious Representations , 2016, NIPS.

[34]  Artemy Kolchinsky,et al.  Caveats for information bottleneck in deterministic scenarios , 2018, ICLR.

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  Romain Hérault,et al.  Neural Networks Regularization Through Class-wise Invariant Representation Learning , 2017, ArXiv.