Statistical testing under distributional shifts

In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis P ∗ ∈ H0 for a target distribution P ∗, but observe data from a different distribution Q∗. We assume that P ∗ is related to Q∗ through a known shift τ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set and then applies an existing test in the target domain. We prove that if the size of the resample is at most o( √ n) and the resampling weights are well-behaved, this procedure inherits the pointwise asymptotic level and power from the target test. If the map τ is estimated from data, we can maintain the above guarantees under mild conditions if the estimation works sufficiently well. We further extend our results to uniform asymptotic level and a different resampling scheme. Testing under distributional shifts allows us to tackle a diverse set of problems. We argue that it may prove useful in reinforcement learning and covariate shift, we show how it reduces conditional to unconditional independence testing and we provide example applications in causal inference.

[1]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[2]  James M. Robins,et al.  Parameter and Structure Learning in Nested Markov Models , 2012, 1207.5058.

[3]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[4]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[5]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[6]  Martha White,et al.  Importance Resampling for Off-policy Prediction , 2019, NeurIPS.

[7]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[8]  Michael I. Jordan Graphical Models , 2003 .

[9]  Heishiro Kanagawa,et al.  Testing Goodness of Fit of Conditional Density Models with Kernels , 2020, UAI.

[10]  James M. Robins,et al.  Testing Edges by Truncations , 2009, IJCAI.

[11]  T. Haavelmo,et al.  The probability approach in econometrics , 1944 .

[12]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[13]  R. Oaxaca Another Look at Tests of Equality between Sets of Coefficients in Two Linear Regressions , 1974 .

[14]  J. Robins,et al.  Estimating causal effects from epidemiological data , 2006, Journal of Epidemiology and Community Health.

[15]  Daniel Thalmann,et al.  Autonomy , 2005, SIGGRAPH Courses.

[16]  Philip S. Thomas,et al.  High Confidence Policy Improvement , 2015, ICML.

[17]  F. Götze,et al.  RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[18]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[19]  Richard S. Sutton,et al.  Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.

[20]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[21]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[22]  James M. Robins,et al.  INTRODUCTION TO NESTED MARKOV MODELS , 2014 .

[23]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[24]  Joris M. Mooij,et al.  Foundations of structural causal models with cycles and latent variables , 2016, The Annals of Statistics.

[25]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[26]  Michael Eichler,et al.  Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors , 2009, J. Mach. Learn. Res..

[27]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[28]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[29]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[30]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[33]  Mengdi Wang,et al.  Bootstrapping Fitted Q-Evaluation for Off-Policy Inference , 2021, ICML.

[34]  James M. Robins,et al.  Nested Markov Properties for Acyclic Directed Mixed Graphs , 2012, UAI.

[35]  Judea Pearl,et al.  Dormant Independence , 2008, AAAI.

[36]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[37]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[38]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[39]  J. Pearl,et al.  A New Identification Condition for Recursive Models With Correlated Errors , 2002 .

[40]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[41]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[42]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[43]  Robin J. Evans,et al.  Distributional Equivalence and Structure Learning for Bow-free Acyclic Path Diagrams , 2015, 1508.01717.

[44]  Ashley I Naimi,et al.  Constructing Inverse Probability Weights for Continuous Exposures: A Comparison of Methods , 2014, Epidemiology.

[45]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[46]  D. Rubin,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[47]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[48]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[49]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[50]  Ø. Skare,et al.  Improved Sampling‐Importance Resampling and Reduced Bias Importance Sampling , 2003 .

[51]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[52]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[53]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[54]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Jonas Peters,et al.  Invariant Policy Learning: A Causal Perspective , 2021, ArXiv.

[56]  Thomas B. Berrett,et al.  The conditional permutation test for independence while controlling for confounders , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).