Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts

Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are robust with regard to moment conditions. When these moment conditions are specified using Kullback-Leibler or Wasserstein-type divergences, we can find the robust estimators efficiently using convex optimization. We use the Bernstein online aggregation algorithm on the proposed family of robust experts to generate predictions for the sequential stream of target test samples. Numerical experiments on real data show that the robust strategies may outperform non-robust interpolations of the empirical least squares estimators.

[1]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[2]  Gabriela Csurka,et al.  A Comprehensive Survey on Domain Adaptation for Visual Applications , 2017, Domain Adaptation in Computer Vision Applications.

[3]  Rui Gao Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality , 2020, ArXiv.

[4]  John C. Duchi,et al.  Learning Models with Uniform Performance via Distributionally Robust Optimization , 2018, ArXiv.

[5]  Mehryar Mohri,et al.  Adaptation Based on Generalized Discrepancy , 2019, J. Mach. Learn. Res..

[6]  Kamyar Azizzadenesheli,et al.  Regularized Learning for Domain Adaptation under Label Shifts , 2019, ICLR.

[7]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[8]  Yura Malitsky,et al.  Adaptive gradient descent without descent , 2019, ICML.

[9]  C. Villani Optimal Transport: Old and New , 2008 .

[10]  Saeid Nahavandi,et al.  Seeded transfer learning for regression problems with deep learning , 2019, Expert Syst. Appl..

[11]  Melvyn Sim,et al.  Distributionally Robust Optimization and Its Tractable Approximations , 2010, Oper. Res..

[12]  Peter Stone,et al.  Boosting for Regression Transfer , 2010, ICML.

[13]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[14]  Brian D. Ziebart,et al.  Robust Covariate Shift Regression , 2016, AISTATS.

[15]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[16]  Bernhard Schölkopf,et al.  Semi-Supervised Domain Adaptation with Non-Parametric Copulas , 2012, NIPS.

[17]  Wolfram Wiesemann,et al.  Calculating Optimistic Likelihoods Using (Geodesically) Convex Optimization , 2019, NeurIPS.

[18]  Wolfram Wiesemann,et al.  Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation , 2019, NeurIPS.

[19]  Avishek Saha,et al.  Co-regularization Based Semi-supervised Domain Adaptation , 2010, NIPS.

[20]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[21]  Ievgen Redko,et al.  Advances in Domain Adaptation Theory , 2019 .

[22]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[23]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[25]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[26]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[27]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[29]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[30]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[31]  Nicolas Vayatis,et al.  Adversarial Weighting for Domain Adaptation in Regression , 2020, 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI).

[32]  Avishek Saha,et al.  Active Supervised Domain Adaptation , 2011, ECML/PKDD.

[33]  Jochen Garcke,et al.  Importance Weighted Inductive Transfer Learning for Regression , 2014, ECML/PKDD.

[34]  R. McCann A Convexity Principle for Interacting Gases , 1997 .

[35]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[36]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[38]  Huan Xu,et al.  Robust Hypothesis Testing Using Wasserstein Uncertainty Sets , 2018, NeurIPS.

[39]  Chong-Wah Ngo,et al.  Semi-supervised Domain Adaptation with Subspace Learning for visual recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  D. Bernstein Matrix Mathematics: Theory, Facts, and Formulas , 2009 .

[41]  Fatih Murat Porikli,et al.  Domain Adaptation by Mixture of Alignments of Second-or Higher-Order Scatter Tensors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Olivier Wintenberger,et al.  Optimal learning with Bernstein online aggregation , 2014, Machine Learning.

[43]  Anders Søgaard,et al.  Semi-Supervised Learning and Domain Adaptation in Natural Language Processing , 2013, Semi-Supervised Learning and Domain Adaptation in Natural Language Processing.

[44]  Jieping Ye,et al.  Transfer Learning for Survival Analysis via Efficient L2,1-Norm Regularized Cox Regression , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[45]  M. Sion On general minimax theorems , 1958 .

[46]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[47]  Viet Anh Nguyen,et al.  Robust Bayesian Classification Using an Optimistic Score Ratio , 2020, ICML.

[48]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[49]  Mathilde Mougeot,et al.  Unsupervised Multi-source Domain Adaptation for Regression , 2020, ECML/PKDD.

[50]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[51]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[52]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[53]  Diane J. Cook,et al.  A Survey of Unsupervised Deep Domain Adaptation , 2018, ACM Trans. Intell. Syst. Technol..

[54]  Viet Anh Nguyen,et al.  Wasserstein Distributionally Robust Kalman Filtering , 2018, NeurIPS.

[55]  Mengjie Zhang,et al.  Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[56]  Yisong Yue,et al.  Distributionally Robust Learning for Unsupervised Domain Adaptation , 2020, ArXiv.

[57]  Viet Anh Nguyen,et al.  Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning , 2019, Operations Research & Management Science in the Age of Analytics.

[58]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[60]  Henry Lam,et al.  Recovering Best Statistical Guarantees via the Empirical Divergence-Based Distributionally Robust Optimization , 2016, Oper. Res..

[61]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[62]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[63]  Giorgio Battistelli,et al.  Consensus CPHD Filter for Distributed Multitarget Tracking , 2013, IEEE Journal of Selected Topics in Signal Processing.

[64]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[65]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[66]  Sethuraman Panchanathan,et al.  A Two-Stage Weighting Framework for Multi-Source Domain Adaptation , 2011, NIPS.

[67]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[68]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[69]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.