Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian Networks

We present the Fast Greedy Equivalence Search (FGES)-Merge, a new method for learning the structure of gene regulatory networks via merging locally learned Bayesian networks, based on the fast greedy equivalent search algorithm. The method is competitive with the state of the art in terms of the Matthews correlation coefficient, which takes into account both precision and recall, while also improving upon it in terms of speed, scaling up to tens of thousands of variables and being able to use empirical knowledge about the topological structure of gene regulatory networks. We apply this method to learning the gene regulatory network for the full human genome using data from samples of different brain structures (from the Allen Human Brain Atlas). Furthermore, this Bayesian network model should predict interactions between genes in a way that is clear to experts, following the current trends in explainable artificial intelligence. To achieve this, we also present a new open-access visualization tool that facilitates the exploration of massive networks and can aid in finding nodes of interest for experimental tests.

[1]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[2]  F. Crick Central Dogma of Molecular Biology , 1970, Nature.

[3]  Fei Liu,et al.  Inference of Gene Regulatory Network Based on Local Bayesian Networks , 2016, PLoS Comput. Biol..

[4]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[5]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[6]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[7]  Madhu Chetty,et al.  Improving gene regulatory network inference using network topology information. , 2015, Molecular bioSystems.

[8]  Ryuei Nishii,et al.  Statistical and Machine Learning Approaches to Predict Gene Regulatory Networks From Transcriptome Datasets , 2018, Front. Plant Sci..

[9]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[10]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[11]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[12]  Clark Glymour,et al.  Estimating feedforward and feedback effective connections from fMRI time series: Assessments of statistical methods , 2019, Network Neuroscience.

[13]  Pedro Larrañaga,et al.  A Guide to the Literature on Inferring Genetic Networks by Probabilistic Graphical Models , 2005, Data Analysis and Visualization in Genomics and Proteomics.

[14]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[15]  Jens M. Rick,et al.  Quantitative mass spectrometry in proteomics: a critical review , 2007, Analytical and bioanalytical chemistry.

[16]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[17]  B. Thisse,et al.  High-resolution in situ hybridization to whole-mount zebrafish embryos , 2007, Nature Protocols.

[18]  Andrew J. Bulpitt,et al.  From gene expression to gene regulatory networks in Arabidopsis thaliana , 2009, BMC Systems Biology.

[19]  S. Bodovitz,et al.  Single cell analysis: the new frontier in 'omics'. , 2010, Trends in biotechnology.

[20]  G. Nuovo,et al.  PCR in situ hybridization. , 1994, Methods in molecular biology.

[21]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[22]  David Edwards,et al.  Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests , 2010, BMC Bioinformatics.

[23]  M Nirenberg,et al.  RNA codewords and protein synthesis, VII. On the general nature of the RNA code. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[24]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[25]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[26]  Marco Grzegorczyk,et al.  Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data. , 2018, Methods in molecular biology.

[27]  Francesco Falciani,et al.  DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research , 2007, Molecular medicine.

[28]  Jin-Kao Hao,et al.  Improving the Louvain Algorithm for Community Detection with Modularity Maximization , 2013, Artificial Evolution.

[29]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[30]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[31]  Cheng-Yan Kao,et al.  A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae , 2005, Bioinform..

[32]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[33]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[34]  Matthias Bopp,et al.  Cancer comortality patterns in schizophrenia and psychotic disorders: a new methodological approach for unique databases , 2014, International journal of methods in psychiatric research.

[35]  Matthieu Vignes,et al.  Gene Regulatory Networks: A Primer in Biological Processes and Statistical Modelling. , 2018, Methods in molecular biology.

[36]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[37]  Guido Sanguinetti,et al.  Combining tree-based and dynamical systems for the inference of gene regulatory networks , 2015, Bioinform..

[38]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .

[39]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[40]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[41]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[42]  Michael R. Kosorok,et al.  Detection of gene pathways with predictive power for breast cancer prognosis , 2010, BMC Bioinformatics.

[43]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[44]  Serafín Moral,et al.  Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions , 2019, Statistical applications in genetics and molecular biology.

[45]  R Scheines,et al.  The TETRAD Project: Constraint Based Aids to Causal Model Specification. , 1998, Multivariate behavioral research.

[46]  Pierre Geurts,et al.  Gene Regulatory Network Inference from Systems Genetics Data Using Tree-Based Methods , 2013 .

[47]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[48]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[49]  Francisco Gómez-Vela,et al.  Computational methods for Gene Regulatory Networks reconstruction and analysis: A review , 2019, Artif. Intell. Medicine.

[50]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[51]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[52]  Joseph Ramsey,et al.  Bayesian networks for fMRI: A primer , 2014, NeuroImage.

[53]  U. Alon An introduction to systems biology : design principles of biological circuits , 2019 .

[54]  Bart Deplancke,et al.  Gene regulatory networks : methods and protocols , 2012 .

[55]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.