Using Bayesian networks to discover relations between genes, environment, and disease

We review the applicability of Bayesian networks (BNs) for discovering relations between genes, environment, and disease. By translating probabilistic dependencies among variables into graphical models and vice versa, BNs provide a comprehensible and modular framework for representing complex systems. We first describe the Bayesian network approach and its applicability to understanding the genetic and environmental basis of disease. We then describe a variety of algorithms for learning the structure of a network from observational data. Because of their relevance to real-world applications, the topics of missing data and causal interpretation are emphasized. The BN approach is then exemplified through application to data from a population-based study of bladder cancer in New Hampshire, USA. For didactical purposes, we intentionally keep this example simple. When applied to complete data records, we find only minor differences in the performance and results of different algorithms. Subsequent incorporation of partial records through application of the EM algorithm gives us greater power to detect relations. Allowing for network structures that depart from a strict causal interpretation also enhances our ability to discover complex associations including gene-gene (epistasis) and gene-environment interactions. While BNs are already powerful tools for the genetic dissection of disease and generation of prognostic models, there remain some conceptual and computational challenges. These include the proper handling of continuous variables and unmeasured factors, the explicit incorporation of prior knowledge, and the evaluation and communication of the robustness of substantive conclusions to alternative assumptions and data manifestations.

[1]  Scott M. Williams,et al.  New strategies for identifying gene-gene interactions in hypertension , 2002, Annals of medicine.

[2]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[3]  T. Tosteson,et al.  Design of an epidemiologic study of drinking water arsenic exposure and skin and bladder cancer risk in a U.S. population. , 1998, Environmental health perspectives.

[4]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[5]  Olivier Pourret,et al.  Bayesian networks : a practical guide to applications , 2008 .

[6]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[7]  C. Rotimi,et al.  Genetic Variants Associated with Complex Human Diseases Show Wide Variation across Multiple Populations , 2009, Public Health Genomics.

[8]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[9]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[10]  Judea Pearl,et al.  An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation , 1992, UAI.

[11]  Khaled Mellouli,et al.  Learning Bayesian Network Equivalence Classes from Incomplete Data , 2006, Discovery Science.

[12]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[13]  Paola Sebastiani,et al.  Complex Genetic Models , 2008 .

[14]  Andrew J. Bulpitt,et al.  A Primer on Learning in Bayesian Networks for Computational Biology , 2007, PLoS Comput. Biol..

[15]  P. Spirtes,et al.  From probability to causality , 1991 .

[16]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[17]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[18]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[19]  Yang Xiang,et al.  Book Review: A. Darwiche, Modeling and Reasoning with Bayesian Networks , 2009 .

[20]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[21]  Fengzhan Tian,et al.  Incremental learning of Bayesian networks with hidden variables , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[23]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[24]  Man Leung Wong,et al.  A Novel Hybrid Evolutionary Algorithm for Learning Bayesian Networks from Incomplete Data , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[25]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[26]  Kathryn B. Laskey,et al.  Learning Bayesian networks from incomplete data using evolutionary algorithms , 1999 .

[27]  Volker Tresp,et al.  Discovering Structure in Continuous Variables Using Bayesian Networks , 1995, NIPS.

[28]  Paola Sebastiani,et al.  Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia , 2005, Nature Genetics.

[29]  Christopher M. Bishop,et al.  Advances in Neural Information Processing Systems 8 (NIPS 1995) , 1991 .

[30]  John R Thompson,et al.  Biostatistical Aspects of Genome‐Wide Association Studies , 2008, Biometrical journal. Biometrische Zeitschrift.

[31]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[32]  Andrei S. Rodin,et al.  Mining genetic epidemiology data with Bayesian networks I: Bayesian networks and example application (plasma apoE levels) , 2005, Bioinform..

[33]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[34]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  Richard Scheines,et al.  Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling , 1987 .

[37]  F. Tobin,et al.  PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL FLORIDA ARTIFICIAL INTELLIGENCE RESEARCH SOCIETY CONFERENCE , 2003 .

[38]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[39]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[40]  Marek J. Druzdzel,et al.  Robust Independence Testing for Constraint-Based Learning of Causal Structure , 2002, UAI.

[41]  Xue-wen Chen,et al.  A Markov blanket-based method for detecting causal SNPs in GWAS , 2010, BMC Bioinformatics.

[42]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[43]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[44]  N. Saccone,et al.  Application of Bayesian network structure learning to identify causal variant SNPs from resequencing data , 2011, BMC proceedings.

[45]  Steffen L. Lauritzen,et al.  Graphical Models for Genetic Analyses , 2003 .

[46]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.

[47]  C. Marsit,et al.  DNA repair genotype interacts with arsenic exposure to increase bladder cancer risk. , 2009, Toxicology letters.

[48]  Yu Zhang,et al.  A novel bayesian graphical model for genome‐wide multi‐SNP association mapping , 2012, Genetic epidemiology.

[49]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[50]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[51]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[52]  Fengzhan Tian,et al.  Learning Bayesian networks from incomplete data based on EMI method , 2003, Third IEEE International Conference on Data Mining.

[53]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[54]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[55]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[56]  David V Conti,et al.  A testing framework for identifying susceptibility genes in the presence of epistasis. , 2006, American journal of human genetics.

[57]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[58]  Peter Kraft,et al.  Exploiting Gene-Environment Interaction to Detect Genetic Associations , 2007, Human Heredity.

[59]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[60]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[61]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[62]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[63]  G. Ginsburg,et al.  The path to personalized medicine. , 2002, Current opinion in chemical biology.

[64]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[65]  Margaret R. Karagas,et al.  Incidence of Transitional Cell Carcinoma of the Bladder and Arsenic Exposure in New Hampshire , 2004, Cancer Causes & Control.

[66]  Rachel Badovinac Ramoni,et al.  A Testable Prognostic Model of Nicotine Dependence , 2009, Journal of neurogenetics.

[67]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[68]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[69]  Judea Pearl,et al.  Bayesian Networks , 1998, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[70]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[71]  F. Hamdy,et al.  A comparison of the performance of microsatellite and methylation urine analysis for predicting the recurrence of urothelial cell carcinoma, and definition of a set of markers by Bayesian network analysis , 2008, BJU international.

[72]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.