Invariant and Sufficient Supervised Representation Learning

Improving the generalization of neural networks under domain shift is an important and challenging task in computer vision. Obtaining an invariant representation across domains is a benchmark method in the literature. In this paper, we propose an invariant and sufficient supervised representation learning (ISSRL) approach to learn a domain invariant representation which is also preserving information used for downstream tasks. To this end, we formulate ISSRL by finding a nonlinear map $\boldsymbol{g}$ such that $Y\perp X\vert \boldsymbol{g}(X)$ and $(Y,\boldsymbol{g}(X))\perp D$ at the population level, where D is the label of the domains and $(X, Y)$ is the paired data sampled from domains with label. We use distance correlation to characterize the (conditional) independence. At the sample level, we construct a novel loss function through an unbiased empirical version of distance correlation. We train the representation map by parameterizing it with deep neural networks. Both simulation study and real data evaluation show that ISSRL outperforms the state-of-the-art on out-of-distribution generalization. The PyTorch code for ISSRL is available at https://github.com/CaC033/ISSRL.

[1]  Hua Wu,et al.  Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation , 2022, NAACL.

[2]  Zirui Wang,et al.  CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..

[3]  Ari S. Morcos,et al.  Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.

[4]  Devansh Arpit,et al.  Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization , 2021, NeurIPS.

[5]  M. Cord,et al.  Fishr: Invariant Gradient Variances for Out-of-distribution Generalization , 2021, ICML.

[6]  Kartik Ahuja,et al.  SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization , 2021, ArXiv.

[7]  Donggeun Yoo,et al.  Reducing Domain Gap by Reducing Style Bias , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Daxin Tian,et al.  Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving , 2021, Frontiers of Information Technology & Electronic Engineering.

[9]  Philip H. S. Torr,et al.  Gradient Matching for Domain Generalization , 2021, ICLR.

[10]  Sho Takase,et al.  Lessons on Parameter Sharing across Layers in Transformers , 2021, SUSTAINLP.

[11]  Sungrae Park,et al.  SWAD: Domain Generalization by Seeking Flat Minima , 2021, NeurIPS.

[12]  Ruocheng Guo,et al.  Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix , 2021, ArXiv.

[13]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[14]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[15]  Kurt Keutzer,et al.  Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  B. Schölkopf,et al.  Learning explanations that are hard to vary , 2020, ICLR.

[17]  Masanori Koyama,et al.  Out-of-Distribution Generalization with Maximal Invariant Predictor , 2020, ArXiv.

[18]  Eric P. Xing,et al.  Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[19]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[20]  Yoshua Bengio,et al.  Learning Causal Models Online , 2020, ArXiv.

[21]  Yuling Jiao,et al.  Deep Dimension Reduction for Supervised Representation Learning , 2020, ArXiv.

[22]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[23]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[24]  Lincan Zou,et al.  Improve Unsupervised Domain Adaptation with Mixup Training , 2020, ArXiv.

[25]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[26]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[27]  Han Zhao,et al.  On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[28]  Rajesh Ranganath,et al.  Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.

[29]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[31]  Amir-Hossein Karimi,et al.  Distance Correlation Autoencoder , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[32]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Bing Li,et al.  Sufficient Dimension Reduction: Methods and Applications with R , 2018 .

[34]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representations , 2018, AAAI.

[35]  Gilles Blanchard,et al.  Domain Generalization by Marginal Transfer Learning , 2017, J. Mach. Learn. Res..

[36]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[37]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Wenhan Yang,et al.  Variation learning guided convolutional network for image interpolation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[39]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[41]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[42]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[43]  Ahmed M. Elgammal,et al.  Supervised Dimensionality Reduction via Distance Correlation Maximization , 2016, ArXiv.

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[46]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[49]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Aaron C. Courville,et al.  Generative adversarial networks , 2014, Commun. ACM.

[51]  Ye Xu,et al.  Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[52]  Bing Li,et al.  A general theory for nonlinear sufficient dimension reduction: Formulation and estimation , 2013, 1304.0580.

[53]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[54]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Lixing Zhu,et al.  Dimension Reduction in Regressions Through Cumulative Slicing Estimation , 2010 .

[56]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[57]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[58]  H. Zha,et al.  Contour regression: A general approach to dimension reduction , 2005, math/0508277.

[59]  W. K. Li,et al.  An adaptive estimation of dimension reduction space , 2002 .

[60]  R. Cook,et al.  Dimension reduction for the conditional kth moment in regression , 2002 .

[61]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[62]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[63]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[64]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[65]  J. Kent Sliced Inverse Regression for Dimension Reduction: Comment , 1991 .

[66]  Zhenguo Li,et al.  OoD-Bench: Benchmarking and Understanding Out-of-Distribution Generalization Datasets and Algorithms , 2021, ArXiv.

[67]  A. Roushan,et al.  Supervised Learning for Autonomous Driving , 2017 .