Causal Feature Selection via Orthogonal Search

The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work in the field of causal discovery exploits invariance properties of models across different experimental conditions for detecting direct causal links. However, these approaches generally do not scale well with the number of explanatory variables, are difficult to extend to nonlinear relationships, and require data across different experiments. Inspired by {\em Debiased} machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data, while also offering theoretical guarantees, including the case of partially nonlinear relationships. Requiring only one estimation for each variable, we can apply our approach even to large graphs, demonstrating significant improvements compared to established approaches.

[1]  Jimeng Sun,et al.  Causal Regularization , 2019, NeurIPS.

[2]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[3]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[4]  Alain Hauser,et al.  High-dimensional consistency in score-based and hybrid structure learning , 2015, The Annals of Statistics.

[5]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[6]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[7]  T. Heskes,et al.  Learning Sparse Causal Models is not NP-hard , 2013, UAI.

[8]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[9]  A. Hayes Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach , 2013 .

[10]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[11]  James M. Robins,et al.  Characterization of parameters with a mixed bias property , 2019, Biometrika.

[12]  Kun Zhang,et al.  Learning Causal Structures Using Regression Invariance , 2017, NIPS.

[13]  Victor Chernozhukov,et al.  Learning L2 Continuous Regression Functionals via Regularized Riesz Representers , 2018 .

[14]  G. Cawley Causal & non-causal feature selection for ridge regression , 2008 .

[15]  Maciej Liskiewicz,et al.  On Searching for Generalized Instrumental Variables , 2016, AISTATS.

[16]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[17]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[18]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[19]  Kui Yu,et al.  A Unified View of Causal and Non-causal Feature Selection , 2018, ACM Trans. Knowl. Discov. Data.

[20]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[21]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[22]  Michael J. Paul,et al.  Feature Selection as Causal Inference: Experiments with Text Classification , 2017, CoNLL.

[23]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[24]  P. Shrout,et al.  Mediation in experimental and nonexperimental studies: new procedures and recommendations. , 2002, Psychological methods.

[25]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[26]  James M. Robins,et al.  A unifying approach for doubly-robust $\ell_1$ regularized estimation of causal contrasts , 2019, 1904.03737.

[27]  Victor Chernozhukov,et al.  Automatic Debiased Machine Learning of Causal and Structural Effects , 2018 .

[28]  Judea Pearl,et al.  Generalized Instrumental Variables , 2002, UAI.

[29]  Patrick Schwab,et al.  predCOVID-19: A Systematic Study of Clinical Predictive Models for Coronavirus Disease 2019 , 2020, ArXiv.

[30]  Bernhard Schölkopf,et al.  Selecting causal brain features with a single conditional independence test per feature , 2019, NeurIPS.

[31]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[32]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[33]  Christina Heinze-Deml,et al.  Causal Structure Learning , 2017, 1706.09141.

[34]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[35]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[36]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[37]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..