Causal discovery on high dimensional data

Existing causal discovery algorithms are usually not effective and efficient enough on high dimensional data. Because the high dimensionality reduces the discovered accuracy and increases the computation complexity. To alleviate these problems, we present a three-phase approach to learn the structure of nonlinear causal models by taking the advantage of feature selection method and two state of the art causal discovery methods. In the first phase, a greedy search method based on Max-Relevance and Min-Redundancy is employed to discover the candidate causal set, a rough skeleton of the causal network is generated accordingly. In the second phase, constraint-based method is explored to discover the accurate skeleton from the rough skeleton. In the third phase, direction learning algorithm IGCI is conducted to distinguish the direction of causalities from the accurate skeleton. The experimental results show that the proposed approach is both effective and scalable, particularly with interesting findings on the high dimensional data.

[1]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jennifer Clark,et al.  Comprehensive genotypic analysis of leukemia: clinical and therapeutic implications. , 2002, Current opinion in oncology.

[3]  Moon Yul Huh,et al.  Variable Selection Based on Mutual Information , 2009 .

[4]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[5]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[6]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hai-Long Wu,et al.  Variable selection using probability density function similarity for support vector machine classification of high-dimensional microarray data. , 2009, Talanta.

[8]  Raymond W. Yeung,et al.  A First Course in Information Theory (Information Technology: Transmission, Processing and Storage) , 2006 .

[9]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[10]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[11]  V. Shingler,et al.  Purification and characterization of a 19-kilodalton intracellular protein. An activation-regulated putative protein kinase C substrate of T lymphocytes. , 1990, The Journal of biological chemistry.

[12]  Bernhard Schölkopf,et al.  Identifying Cause and Effect on Discrete Data using Additive Noise Models , 2010, AISTATS.

[13]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[14]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[15]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[16]  Edward H. Herskovits,et al.  Computer-based probabilistic-network construction , 1992 .

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[19]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  C. Pelizzari Registration of Localization Images by Maximization of Mutual Information , 1996 .

[21]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[22]  C. Gilks,et al.  Changes in chromatin organization at the neutrophil elastase locus associated with myeloid cell differentiation. , 1999, Blood.

[23]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[24]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .