Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. While these approaches are widely used in practice and achieve impressive empirical gains, their theoretical understanding largely lags behind. Towards bridging this gap, we present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data. We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches. Our sample complexity bounds show that, with carefully chosen hypothesis classes to exploit the structure in the data, such functional regularization can prune the hypothesis space and help reduce the labeled data needed. We then provide two concrete examples of functional regularization, one using auto-encoders and the other using masked self-supervision, and apply the framework to quantify the reduction in the sample complexity bound. We also provide complementary empirical results for the examples to support our analysis.

[1]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[2]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[3]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[4]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[5]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[6]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[7]  J. Andrew Bagnell,et al.  Improving robot navigation through self‐supervised online learning , 2006, J. Field Robotics.

[8]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[9]  Shai Ben-David,et al.  On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples , 2012, ALT.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Sham M. Kakade,et al.  Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.

[12]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guillermo Sapiro,et al.  Sparse representations for image classification: learning discriminative and reconstructive non-parametric dictionaries , 2008 .

[14]  Tengyu Ma,et al.  A Non-generative Framework and Convex Relaxations for Unsupervised Learning , 2016, NIPS.

[15]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[16]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[18]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[19]  Dacheng Tao,et al.  Algorithm-Dependent Generalization Bounds for Multi-Task Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[22]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[23]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[24]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[25]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[26]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[28]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[29]  Lei Le,et al.  Supervised autoencoders: Improving generalization performance with unsupervised regularizers , 2018, NeurIPS.

[30]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[32]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[33]  Raja Giryes,et al.  Autoencoders , 2020, ArXiv.

[34]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[35]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[36]  Hongyang Zhang,et al.  Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.

[37]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[40]  Sergey Levine,et al.  Grasp2Vec: Learning Object Representations from Self-Supervised Grasping , 2018, CoRL.

[41]  Steve Hanneke,et al.  A No-Free-Lunch Theorem for MultiTask Learning , 2020, The Annals of Statistics.

[42]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[43]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[45]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[46]  Pascal Germain,et al.  PAC-Bayesian Contrastive Unsupervised Representation Learning , 2019, UAI.

[47]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[48]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[49]  Xiaohan Chen,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[50]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[51]  Christian Jutten,et al.  Dictionary Learning for Sparse Representation: A Novel Approach , 2013, IEEE Signal Processing Letters.

[52]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[53]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[54]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[55]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[56]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[57]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[58]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[60]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[61]  Michael I. Jordan,et al.  On the Theory of Transfer Learning: The Importance of Task Diversity , 2020, NeurIPS.

[62]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[63]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.