论文信息 - Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Invariant risk minimization (IRM) has recently emerged as a promising alternative for domain generalization. Nevertheless, the loss function is difficult to optimize for nonlinear classifiers and the original optimization objective could fail when pseudo-invariant features and geometric skews exist. Inspired by IRM, in this paper we propose a novel formulation for domain generalization, dubbed invariant information bottleneck (IIB). IIB aims at minimizing invariant risks for nonlinear classifiers and simultaneously mitigating the impact of pseudo-invariant features and geometric skews. Specifically, we first present a novel formulation for invariant causal prediction via mutual information. Then we adopt the variational formulation of the mutual information to develop a tractable loss function for nonlinear classifiers. To overcome the failure modes of IRM, we propose to minimize the mutual information between the inputs and the corresponding representations. IIB significantly outperforms IRM on synthetic datasets, where the pseudo-invariant features and geometric skews occur, showing the effectiveness of proposed formulation in overcoming failure modes of IRM. Furthermore, experiments on DomainBed show that IIB outperforms 13 baselines by 0.9% on average across 7 real datasets. Introduction In most statistical machine learning algorithms, a fundamental assumption is that the training data and test data are independently and identically distributed (i.i.d.). However, the data we have in many real-world applications are not i.i.d. Distributional shifts are ubiquitous. Under such circumstances, classic statistical learning paradigms with strong generalization guarantees, e.g., Empirical Risk Minimization (ERM) (Vapnik 1999), often fail to generalize due to the violation of the i.i.d. assumption. It has been widely observed that the performance of a model often deteriorates dramatically when it is faced with samples from a different domain, even under a mild distributional shift (Arjovsky et al. 2019). On the other hand, collecting training samples from all possible future scenarios is essentially infeasible. Hence, understanding and improving the generalization of models on out-of-distribution data is crucial. Domain generalization (DG), which aims to learn a model from several different domains so that it generalizes to unCopyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. seen related domains, has recently received much attention. From the perspective of representation learning, there are several paradigms towards this goal, including invariant representation learning (Muandet, Balduzzi, and Schölkopf 2013; Zhao et al. 2018; Tachet des Combes et al. 2020), invariant causality prediction (Arjovsky et al. 2019; Krueger et al. 2020b), meta-learning (Balaji, Sankaranarayanan, and Chellappa 2018; Du et al. 2020), and feature disentanglement (Du et al. 2020; Peng et al. 2019). Of particular interest is the invariant learning methods. Some early works, e.g., DANN (Ganin et al. 2017), CDANN (Long et al. 2018), aim at finding representations that are invariant across domains. Nevertheless, learning invariant representations fails for domain adaptation or generalization when the marginal label distributions change between source and target domains (Zhao et al. 2019a). Recently, Invariant Causal Prediction (ICP), and its follow-up Invariant Risk Minimization (IRM), have attracted much interest. ICP assumes that the data are generated according to a structural causal model (SCM) (Pearl 2010). The causal mechanism for the data generating process is the same across domains, while the interventions can vary among different domains. Under such data generative assumptions, IRM (Arjovsky et al. 2019) attempts to learn an optimal classifier that is invariant across domains. ICP then argues that under the SCM assumption, such a classifier can generalize across domains. Despite the intuitive motivations, IRM falls short in several aspects. First, the proposed loss function in (Arjovsky et al. 2019) is difficult to optimize when the classifier is nonlinear. Furthermore, it has been shown that IRM fails when the pseudo-invariant features (Rosenfeld, Ravikumar, and Risteski 2020) or geometric skews exist (Nagarajan, Andreassen, and Neyshabur 2021). Under such circumstances, the classifier will utilize both the causal and spurious features, leading to a violation of invariant causal prediction. To address the first issue, we propose an information-theoretical formulation of invariant causal prediction and adopt a variational approximation to ease the optimization procedure. To tackle the second issue, we emphasize that the use of pseudo-invariant features or geometric skews will inevitably increase the mutual information between the inputs and the representations. Thus, to mitigate the impact of pseudoinvariant features and geometric skews, we propose to constrain this mutual information, which naturally leads to a ar X iv :2 10 6. 06 33 3v 5 [ cs .L G ] 1 0 D ec 2 02 1 formulation of information bottleneck. Our empirical results show that the proposed approach can effectively improve the accuracy when the pseudo-invariant features and geometric skews exist. Contributions: We propose a novel information-theoretic formulation for domain generalization, termed as invariant information bottleneck (IIB). IIB aims at minimizing invariant risks while at the same time mitigating the impact of pseudo-invariant features and geometric skews. Specifically, our contributions can be summarized as follows: (1) We propose a novel formulation for invariant causal prediction via mutual information. We further adopt variational approximation to develop tractable loss functions for nonlinear classifiers. (2) To mitigate the impact of pseudo-invariant features and geometric skews, inspired by the information bottleneck principle, we propose to constrain the mutual information between the inputs and the representations. The effectiveness is verified by the synthetic experiments of failure modes (Ahuja et al. 2021; Nagarajan, Andreassen, and Neyshabur 2021), where IIB significantly improves the performance of IRM. (3) Empirically, we analyze IIB’s performance with extensive experiments on both synthetic and large-scale benchmarks. We show that IIB is able to eliminate the spurious information better than other existing DG methods, and achieves consistent improvements on 7 datasets by 0.7% on DomainBed (Gulrajani and Lopez-Paz 2020). Related Work Domain Generalization Existing methods of DG can be divided into three categories: (1) Data Manipulation: Machine learning models typically rely on diverse training data to enhance the generalization ability. Data manipulation/augmentation methods (Nazari and Kovashka 2020; Riemer et al. 2019) aim to increase the diversity of existing training data with operations including flipping, rotation, etc. Domain randomization (Borrego et al. 2018; Yue et al. 2019; Zakharov, Kehl, and Ilic 2019) provides more complex operations for image data, such as altering the location/texture of objects, replicating and resizing objects. In addition, there are some methods (Riemer et al. 2019; Qiao, Zhao, and Peng 2020; Liu et al. 2018; Truong et al. 2019; Zhao et al. 2019b) that exploits generated data samples to enhance the model generalization ability. (2) Ensemble Learning methods (Mancini et al. 2018; Segù, Tonioni, and Tombari 2020) assume that any sample in the test domain can be regarded as an integrated sample of the multiple-source domains, so the overall prediction should be inferred by a combination of the models trained on different domains. (3) Meta-Learning aims at learning a general model from multiple domains. In terms of domain generalization, MLDG (Li et al. 2018a) divides data from the multiple domains into meta-train and meta-test to simulate the domain shift situation to learn the general representations. In particular, Meta-Reg (Balaji, Sankaranarayanan, and Chellappa 2018) learns a meta-regularizer for the classifier, and Meta-VIB (Du et al. 2020) learns to generate the weights in the meta-learning paradigm by regularizing the KL divergence between marginal distributions of representations of the same category but from different domains. Mutual Information-based Domain Adaptation Domain Adaptation is an important topic in the direction of transfer learning (Long et al. 2015; Ganin et al. 2016; Tzeng et al. 2017; Long et al. 2018; Zhao et al. 2021, 2020c,b; Li et al. 2020a). The mutual information-based approaches have been widely applied in this area. The key idea is to learn a domain-invariant representation that are informative to the label, which can be formulated as (Zhao et al. 2020a; Li et al. 2020b) max Z I(Z,Y ) − λI(Z,A) (1) where A is the identity of domains, Z denotes the representation, and Y denotes the labels. Commonly adopted implementations of (1) are DANN (Ganin et al. 2017) and CDANN (Long et al. 2018). These implementations are also often adopted in domain generalization as baselines (Gulrajani and Lopez-Paz 2020). Invariant Risk Minimization The above approaches enforces the invariance of the learned representations. On the other hand, Invariant Risk Minimization (IRM) suggests the invariance of featureconditioned label distribution. Specifically, IRM seeks for an invariant causal prediction such that E[Y ∣Φ(X)] = E[Y e ′ ∣Φ(X ′ )], for all e, e′ ∈ E . The objective of IRM is given by min w,Φ ∑ e∈Etrain R(w ○Φ),

[1] Gerald Tesauro,et al. Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[2] Kurt Keutzer,et al. Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Trevor Darrell,et al. Semi-supervised Domain Adaptation with Instance Constraints , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[6] Sethuraman Panchanathan,et al. Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Luc Van Gool,et al. ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8] Regina Barzilay,et al. Domain Extrapolation via Regret Minimization , 2020, ArXiv.

[9] Trevor Darrell,et al. Semi-Supervised Domain Adaptation via Minimax Entropy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Jonathan Baxter,et al. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[11] Rich Caruana,et al. Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[12] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13] Chang Xu,et al. Self-Supervised Representation Learning From Multi-Domain Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Adriana Kovashka,et al. Domain Generalization Using Shape Representation , 2020, ECCV Workshops.

[15] Kurt Keutzer,et al. Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey , 2020, ArXiv.

[16] David Tse,et al. A Minimax Approach to Supervised Learning , 2016, NIPS.

[17] Eric P. Xing,et al. Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[18] Yoshua Bengio,et al. Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization , 2021, ArXiv.

[19] Tommi S. Jaakkola,et al. Invariant Rationalization , 2020, ICML.

[20] Fabio Maria Carlucci,et al. From Source to Target and Back: Symmetric Bi-Directional Adaptive GAN , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Charles X. Ling,et al. Fast Generalized Distillation for Semi-Supervised Domain Adaptation , 2017, AAAI.

[22] Tong Che,et al. Rethinking Distributional Matching Based Domain Adaptation , 2020, ArXiv.

[23] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[24] Pradeep Ravikumar,et al. The Risks of Invariant Risk Minimization , 2020, ICLR.

[25] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.

[26] Marc'Aurelio Ranzato,et al. Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[27] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[29] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Siddhartha Chaudhuri,et al. Generalizing Across Domains via Cross-Gradient Training , 2018, ICLR.

[31] José M. F. Moura,et al. Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[32] Fulton. Wang. Addressing two issues in machine learning: interpretability and dataset shift , 2018 .

[33] Alexandros Karatzoglou,et al. Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[34] Lincan Zou,et al. Improve Unsupervised Domain Adaptation with Mixup Training , 2020, ArXiv.

[35] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[36] Cordelia Schmid,et al. What makes for good views for contrastive learning , 2020, NeurIPS.

[37] Sergey Levine,et al. Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift , 2020, ArXiv.

[38] Juergen Gall,et al. Open Set Domain Adaptation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39] Han Zhao,et al. On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[40] Minh-Triet Tran,et al. Image Alignment in Unseen Domains via Domain Deep Generalization , 2019, ArXiv.

[41] Federico Tombari,et al. Batch Normalization Embeddings for Deep Domain Generalization , 2020, Pattern Recognit..

[43] Kurt Keutzer,et al. MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation , 2020, International Journal of Computer Vision.

[44] Christina Heinze-Deml,et al. Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[45] Yu-Chiang Frank Wang,et al. A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation , 2018, NeurIPS.

[46] Percy Liang,et al. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[47] Kate Saenko,et al. Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[48] Trevor Cohn,et al. Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[49] Slobodan Ilic,et al. DeceptionNet: Network-Driven Domain Randomization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50] Ye Xu,et al. Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[51] Svetlana Lazebnik,et al. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Nathan Srebro,et al. Does Invariant Risk Minimization Capture Invariance? , 2021, ArXiv.

[53] Judea Pearl,et al. Causal Inference , 2010 .

[54] Ivor W. Tsang,et al. Learning With Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Chao Chen,et al. HoMM: Higher-order Moment Matching for Unsupervised Domain Adaptation , 2019, AAAI.

[56] Tae-Hyun Oh,et al. Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels , 2020, ArXiv.

[57] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.

[58] Geoffrey J. Gordon,et al. Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift , 2020, NeurIPS.

[59] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[60] Bernhard Schölkopf,et al. Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[61] Tatsuya Harada,et al. Open Set Domain Adaptation by Backpropagation , 2018, ECCV.

[62] Fei Chen,et al. Risk Variance Penalization: From Distributional Robustness to Causality , 2020, ArXiv.

[63] Yi Yang,et al. Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Shruti Tople,et al. Domain Generalization using Causal Matching , 2020, ICML.

[65] Bo Wang,et al. Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66] Han Zhao,et al. Efficient Multitask Feature and Relationship Learning , 2017, UAI.

[67] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[68] Swami Sankaranarayanan,et al. MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[69] Alberto L. Sangiovanni-Vincentelli,et al. Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70] D. Tao,et al. Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[71] Yongxin Yang,et al. Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72] Michael I. Jordan,et al. Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[73] Mengjie Zhang,et al. Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[74] Robert M. French,et al. Catastrophic Interference in Connectionist Networks: Can It Be Predicted, Can It Be Prevented? , 1993, NIPS.

[75] Rich Caruana,et al. Algorithms and Applications for Multitask Learning , 1996, ICML.

[76] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[77] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[78] Barbara Caputo,et al. Best Sources Forward: Domain Generalization through Source-Specific Nets , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[79] Yongxin Yang,et al. Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[80] Qingming Huang,et al. Deep Unsupervised Convolutional Domain Adaptation , 2017, ACM Multimedia.

[81] Trevor Darrell,et al. ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation , 2020, AAAI.

[82] Sung Ju Hwang,et al. Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[83] Alexandre Bernardino,et al. Applying Domain Randomization to Synthetic Data for Object Category Detection , 2018, ArXiv.

[84] Yutaka Matsuo,et al. Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization , 2019, ECML/PKDD.

[85] Aaron C. Courville,et al. Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[86] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[87] Pradeep Ravikumar,et al. Fundamental Limits and Tradeoffs in Invariant Representation Learning , 2020, ArXiv.

[88] Suguru Arimoto,et al. An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[89] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90] David Lopez-Paz,et al. In Search of Lost Domain Generalization , 2020, ICLR.

[91] Pietro Perona,et al. Recognition in Terra Incognita , 2018, ECCV.

[92] Thomas Steinke,et al. Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[93] Shanghang Zhang,et al. Instance Adaptive Self-Training for Unsupervised Domain Adaptation , 2020, ECCV.

[94] Behnam Neyshabur,et al. Understanding the Failure Modes of Out-of-Distribution Generalization , 2021, ICLR.

[95] Ling Shao,et al. Learning to Learn with Variational Information Bottleneck for Domain Generalization , 2020, ECCV.

[96] Xi Peng,et al. Learning to Learn Single Domain Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[97] Rajesh Ranganath,et al. Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.

[98] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[99] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[100] Kate Saenko,et al. Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[101] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[102] Fabio Maria Carlucci,et al. Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[103] A. U.S.,et al. Predictability , Complexity , and Learning , 2002 .

[104] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.

[105] MarchandMario,et al. Domain-adversarial training of neural networks , 2016 .

[106] Donggeun Yoo,et al. Reducing Domain Gap by Reducing Style Bias , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[107] Ronald Kemker,et al. FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[108] Han Zhao,et al. Unsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption , 2017, AAAI.

[109] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[110] Gabriela Csurka,et al. A Comprehensive Survey on Domain Adaptation for Visual Applications , 2017, Domain Adaptation in Computer Vision Applications.

[111] Zhibo Chen,et al. Style Normalization and Restitution for Domain Generalization and Adaptation , 2021, IEEE Transactions on Multimedia.

[112] Kurt Keutzer,et al. Multi-source Domain Adaptation for Semantic Segmentation , 2019, NeurIPS.

[113] Minh-Triet Tran,et al. Recognition in Unseen Domains: Domain Generalization via Universal Non-volume Preserving Models , 2019, ArXiv.

[114] Aaron C. Courville,et al. Out-of-Distribution Generalization via Risk Extrapolation , 2020 .

[115] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[116] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[117] Bernhard Schölkopf,et al. Domain Generalization via Invariant Feature Representation , 2013, ICML.

[118] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[119] Mengjie Zhang,et al. Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[120] Chong-Wah Ngo,et al. Semi-supervised Domain Adaptation with Subspace Learning for visual recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[121] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[122] Gilles Blanchard,et al. Domain Generalization by Marginal Transfer Learning , 2017, J. Mach. Learn. Res..

[123] David Lopez-Paz,et al. Invariant Risk Minimization , 2019, ArXiv.

[124] Richard E. Blahut,et al. Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[125] Alberto L. Sangiovanni-Vincentelli,et al. A Review of Single-Source Deep Unsupervised Visual Domain Adaptation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[126] Kate Saenko,et al. Domain Agnostic Learning with Disentangled Representations , 2019, ICML.

[127] Yongxin Yang,et al. Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[128] D. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[129] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[130] Bernhard Schölkopf,et al. Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.