Causal Feature Selection

This chapter reviews techniques for learning causal relationships from data, in application to the problem of feature selection. Most feature selection methods do not attempt to uncover causal relationships between feature and target and focus instead on making best predictions. We examine situations in which the knowledge of causal relationships benefits feature selection. Such benefits may include: explaining relevance in terms of causal mechanisms, distinguishing between actual features and experimental artifacts, predicting the consequences of actions performed by external agents, and making predictions in non-stationary environments. Conversely, we highlight the benefits that causal discovery may draw from recent developments in feature selection theory and algorithms.

[1]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[2]  D. A. Kenny,et al.  Correlation and Causation , 1937, Wilmott.

[3]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[4]  Editors , 1986, Brain Research Bulletin.

[5]  J. Myers,et al.  The INTERNIST-1/QUICK MEDICAL REFERENCE project--status report. , 1986, The Western journal of medicine.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[7]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[8]  Francisco Javier Díez,et al.  Parameter adjustment in Bayes networks. The generalized noisy OR-gate , 1993, UAI.

[9]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[10]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[11]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[12]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[13]  Edward H. Herskovits,et al.  Application of Bayesian Networks to Health Care , 1997 .

[14]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[15]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[16]  Rex B. Kline,et al.  Principles and Practice of Structural Equation Modeling , 1998 .

[17]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[18]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[19]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  D. Kaplan Structural Equation Modeling: Foundations and Extensions , 2000 .

[22]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[23]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[24]  Meryl P Gardner,et al.  NEW BOOKS IN REVIEW , 2002 .

[25]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[26]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[27]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[28]  Constantin F. Aliferis,et al.  Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery , 2003, METMBS.

[29]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[30]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[31]  Pieter Kraaijeveld,et al.  GeNIeRate: An Interactive Generator of Diagnostic Bayesian Network Models , 2005 .

[32]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[33]  Bernhard Schölkopf,et al.  Causal Inference by Choosing Graphs with Most Plausible Markov Kernels , 2006, AI&M.

[34]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[35]  Constantin F. Aliferis,et al.  Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective , 2006, Cancer informatics.

[36]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.