Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm

In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models.

[1]  Martin J. McKeown,et al.  Dynamic Bayesian network modeling of fMRI: A comparison of group-analysis methods , 2008, NeuroImage.

[2]  J. Pearl,et al.  A statistical semantics for causation , 1992 .

[3]  Bertran Steinsky,et al.  Enumeration of labelled chain graphs and labelled essential directed acyclic graphs , 2003, Discret. Math..

[4]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[5]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[6]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[7]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[8]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[9]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[10]  Shuguang Huang,et al.  Comparison of false discovery rate methods in identifying genes with differential expression. , 2005, Genomics.

[11]  Kevin P. Murphy,et al.  Bayesian structure learning using dynamic programming and MCMC , 2007, UAI.

[12]  Tom M. Mitchell,et al.  Learning to Decode Cognitive States from Brain Images , 2004, Machine Learning.

[13]  R. Fisher 035: The Distribution of the Partial Correlation Coefficient. , 1924 .

[14]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[15]  John D. Storey A direct approach to false discovery rates , 2002 .

[16]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[18]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[20]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[21]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[22]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[23]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[24]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[25]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[26]  E. S. Pearson,et al.  ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE PART I , 1928 .

[27]  Michael I. Jordan Graphical Models , 2003 .

[28]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[29]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[30]  Jennie Malboeuf Algorithm , 1994, Neurology.

[31]  C. Quensel The distribution of the partial correlation coefficient in samples from multivariate universesin a special case of non-normally distributed random variables , 1953 .

[32]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[33]  Tom W. Keller,et al.  Reading Span and the Time-course of Cortical Activation in Sentence-Picture Verification , 2001 .