Confounder Detection in High-Dimensional Linear Models Using First Moments of Spectral Measures

In this letter, we study the confounder detection problem in the linear model, where the target variable Y is predicted using its n potential causes Xn=(x1,…,xn)T. Based on an assumption of a rotation-invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of Xn is close to a uniform measure in purely causal cases, but it differs from a uniform measure characteristically in the presence of a scalar confounder. Analyzing spectral measure patterns could help to detect confounding. In this letter, we propose to use the first moment of the spectral measure for confounder detection. We calculate the first moment of the regression vector–induced spectral measure and compare it with the first moment of a uniform spectral measure, both defined with respect to the covariance matrix of Xn. The two moments coincide in nonconfounding cases and differ from each other in the presence of confounding. This statistical causal-confounding asymmetry can be used for confounder detection. Without the need to analyze the spectral measure pattern, our method avoids the difficulty of metric choice and multiple parameter optimization. Experiments on synthetic and real data show the performance of this method.

[1]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[2]  Aapo Hyvärinen,et al.  Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity , 2010, J. Mach. Learn. Res..

[3]  Random polynomials of high degree and Levy concentration of measure , 2003, math/0303335.

[4]  Lai-Wan Chan,et al.  Causal Discovery on Discrete Data with Extensions to Mixture Model , 2015, ACM Trans. Intell. Syst. Technol..

[5]  B. Schoelkopf,et al.  Detecting Confounding in Multivariate Linear Models via Spectral Analysis , 2017, 1704.01430.

[6]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[7]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[8]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[9]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[10]  Aapo Hyvärinen,et al.  Pairwise likelihood ratios for estimation of non-Gaussian structural equation models , 2013, J. Mach. Learn. Res..

[11]  Lai-Wan Chan,et al.  Causal Inference on Discrete Data via Estimating Distance Correlations , 2016, Neural Computation.

[12]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[13]  Jan Lemeire,et al.  Replacing Causal Faithfulness with Algorithmic Independence of Conditionals , 2013, Minds and Machines.

[14]  Lai-Wan Chan,et al.  Causal Inference on Multidimensional Data Using Free Probability Theory , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Patrik O. Hoyer,et al.  Estimation of causal effects using linear non-Gaussian causal models with hidden variables , 2008, Int. J. Approx. Reason..

[16]  K. Marton Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration , 1996 .

[17]  A. J. Short,et al.  Entanglement and the foundations of statistical mechanics , 2005 .