Error asymmetry in causal and anticausal regression

It is generally difficult to make any statements about the expected prediction error in an univariate setting without further knowledge about how the data were generated. Recent work showed that knowledge about the real underlying causal structure of a data generation process has implications for various machine learning settings. Assuming an additive noise and an independence between data generating mechanism and its input, we draw a novel connection between the intrinsic causal relationship of two variables and the expected prediction error. We formulate the theorem that the expected error of the true data generating function as prediction model is generally smaller when the effect is predicted from its cause and, on the contrary, greater when the cause is predicted from its effect. The theorem implies an asymmetry in the error depending on the prediction direction. This is further corroborated with empirical evaluations in artificial and real-world data sets.

[1]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[2]  Barbara Engelhardt Martin Bayesian group factor analysis with structured sparsity , 2016 .

[3]  Joseph Berkson,et al.  Estimation of a Linear Function for a Calibration Line; Consideration of a Recent Proposal , 1969 .

[4]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[5]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[6]  Bruno Boulanger,et al.  Optimal designs for inverse prediction in univariate nonlinear calibration models , 2004 .

[7]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Based on Exogeneity , 2015, ArXiv.

[8]  Franco Magno,et al.  A statistical overview on univariate calibration, inverse regression, and detection limits: Application to gas chromatography/mass spectrometry technique. , 2007, Mass spectrometry reviews.

[9]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[10]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[11]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[12]  Peter A. Parker,et al.  The Prediction Properties of Inverse and Reverse Regression for the Simple Linear Calibration Problem , 2010 .

[13]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[14]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[16]  Richard G. Krutchkoff,et al.  Classical and Inverse Regression Methods of Calibration in Extrapolation , 1969 .

[17]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[18]  Bernhard Schölkopf,et al.  Semi-supervised interpolation in an anticausal learning scenario , 2015, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[20]  C. Kuo A comparison of two statistical estimators in inverse prediction , 2012 .

[21]  Max Halperin,et al.  On Inverse Estimation in Linear Regression , 1970 .

[22]  G. Geoffrey Vining,et al.  The Prediction Properties of Classical and Inverse Regression for the Simple Linear Calibration Problem , 2010 .

[23]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[24]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[25]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[26]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.