Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm

We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(na) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PC-algorithm for simulated data.

[1]  R. Neapolitan Learning Bayesian networks , 2007, KDD '07.

[2]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[3]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[4]  Eytan Domany,et al.  On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network , 2006, UAI.

[5]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[6]  Anna Goldenberg,et al.  Tractable learning of large Bayes net structures from sparse data , 2004, ICML.

[7]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[8]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[9]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[10]  Jiji Zhang,et al.  Strong Faithfulness and Uniform Consistency in Causal Inference , 2002, UAI.

[11]  G. Wills Introduction to Graphical Modelling , 2002, Technometrics.

[12]  Michael D. Perlman,et al.  Enumerating Markov Equivalence Classes of Acyclic Digraph Models , 2001, UAI.

[13]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[14]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[15]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[16]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[17]  D. Geiger,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1994, Machine Learning.

[18]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[19]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[20]  C. Chow,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[21]  H. Hotelling New Light on the Correlation Coefficient and its Transforms , 1953 .

[22]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[23]  M. Tarsi,et al.  A simple algorithm to construct a consistent extension of a partially oriented graph , 1992 .

[24]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[25]  F. Harary New directions in the theory of graphs , 1973 .

[26]  C. Quensel The distribution of the partial correlation coefficient in samples from multivariate universesin a special case of non-normally distributed random variables , 1953 .

[27]  R. Fisher 035: The Distribution of the Partial Correlation Coefficient. , 1924 .