Discovering biomedical causality by a generative Bayesian causal network under uncertainty

With the rapid development of biomedical technology, discovering causality from genes and human physiological and pathological characteristics has become a hot but challenge spot over the past decades. Due to the increment of the amount of biomedical data, discovering causality from observed data becomes more and more difficult to search this large body of knowledge in a meaningful manner. To address the issues in existing causality discovering models, we introduce a generative Bayesian causal network that combines neural network to explicitly characterize these unique causal-effect relationships as a variable number of nodes and links. Particularly, a basic skeleton is generated for node selection to reduce the network size by minimizing the maximum mean discrepancy among variables. In addition, a causal generative neural network model is presented to construct causal network with cause-effect scores between variables. Empirical evaluations on two publicly available biomedical datasets and four synthetic datasets suggest our approach significantly outperforms the state-of-the-art methods in discovering causal relationships among biomedical variables.

[1]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[2]  Kathryn Demanelis,et al.  Co-occurring expression and methylation QTLs allow detection of common causal variants and shared biological mechanisms , 2018, Nature Communications.

[3]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[4]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[5]  Neda Bagheri,et al.  Windowed Granger causal inference strategy improves discovery of gene regulatory networks , 2018, Proceedings of the National Academy of Sciences.

[6]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[7]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[8]  J. Peters,et al.  Structural Intervention Distance (SID) for Evaluating Causal Graphs , 2013, 1306.1043.

[9]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[10]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[11]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[12]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[13]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[14]  Peter Bühlmann,et al.  Structural Intervention Distance for Evaluating Causal Graphs , 2015, Neural Computation.

[15]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[16]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[17]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  Andreas Ritter,et al.  Structural Equations With Latent Variables , 2016 .

[19]  Peter Bühlmann,et al.  Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs (Abstract) , 2011, UAI.

[20]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .

[21]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[23]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[24]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[25]  J. Pearl Causal diagrams for empirical research , 1995 .

[26]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[27]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[28]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[29]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[30]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[31]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[32]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[33]  Gregory F. Cooper,et al.  An evaluation of a system that recommends microarray experiments to perform to discover gene-regulation pathways , 2004, Artif. Intell. Medicine.