Fundamental Limits and Tradeoffs in Invariant Representation Learning

Many machine learning applications, e.g., privacy-preserving learning, algorithmic fairness and domain adaptation/generalization, involve learning the so-called invariant representations that achieve two competing goals: To maximize information or accuracy with respect to a target while simultaneously maximizing invariance or independence with respect to a set of protected features (e.g. for fairness, privacy, etc). Despite its abundant applications in the aforementioned domains, theoretical understanding on the limits and tradeoffs of invariant representations is still severely lacking. In this paper, we provide an information theoretic analysis of this general and important problem under both classification and regression settings. In both cases, we analyze the inherent tradeoffs between accuracy and invariance by providing a geometric characterization of the feasible region in the information plane, where we connect the geometric properties of this feasible region to the fundamental limitations of the tradeoff problem. In the regression setting, we further give a complete and exact characterization of the frontier between accuracy and invariance. Although our contributions are mainly theoretical, we also demonstrate the practical applications of our results in certifying the suboptimality of certain representation learning algorithms in both classification and regression tasks. Our results shed new light on this fundamental problem by providing insights on the interplay between accuracy and invariance. These results deepen our understanding of this fundamental problem and may be useful in guiding the design of future representation learning algorithms.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Linda F. Wightman LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. , 1998 .

[3]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[4]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[5]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[6]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[7]  Lorenzo Rosasco,et al.  On Invariance in Hierarchical Models , 2009, NIPS.

[8]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[9]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[10]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[11]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[12]  Pedro M. Domingos,et al.  Deep Symmetry Networks , 2014, NIPS.

[13]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[14]  Lorenzo Rosasco,et al.  On Invariance and Selectivity in Representation Learning , 2015, ArXiv.

[15]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[16]  Jihun Hamm Preserving Privacy of Continuous High-dimensional Data with Minimax Filters , 2015, AISTATS.

[17]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[18]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[19]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[20]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..

[21]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[22]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[23]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[24]  Sreeram Kannan,et al.  Estimating Mutual Information for Discrete-Continuous Mixtures , 2017, NIPS.

[25]  Jihun Hamm,et al.  Minimax Filter: Learning to Preserve Privacy from Inference Attacks , 2016, J. Mach. Learn. Res..

[26]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[27]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[28]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[29]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[30]  Toniann Pitassi,et al.  Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[31]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[32]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[33]  Shashi Narayan,et al.  Privacy-preserving Neural Representations of Text , 2018, EMNLP.

[34]  David B. Dunson,et al.  Removing the influence of group variables in high‐dimensional predictive modelling , 2018, Journal of the Royal Statistical Society. Series A,.

[35]  Cheng Soon Ong,et al.  Costs and Benefits of Fair Representation Learning , 2019, AIES.

[36]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[37]  Geoffrey J. Gordon,et al.  Inherent Tradeoffs in Learning Fair Representations , 2019, NeurIPS.

[38]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[39]  An Information-Theoretic Perspective on the Relationship Between Fairness and Accuracy , 2019, ArXiv.

[40]  Fredrik D. Johansson,et al.  Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects , 2020, J. Mach. Learn. Res..

[41]  Geoffrey J. Gordon,et al.  Conditional Learning of Fair Representations , 2019, ICLR.

[42]  Ming-Hsuan Yang,et al.  Adversarial Learning of Privacy-Preserving and Task-Oriented Representations , 2019, AAAI.

[43]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[44]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[45]  Andrej Risteski,et al.  On Learning Language-Invariant Representations for Universal Machine Translation , 2020, ICML.

[46]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yuan Tian,et al.  Understanding and Mitigating Accuracy Disparity in Regression , 2021, ICML.