Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources

Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.

[1]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[2]  Satoru Miyano,et al.  Utilizing Evolutionary Information and Gene Expression Data for Estimating Gene Networks with Bayesian Network Models , 2005, J. Bioinform. Comput. Biol..

[3]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[4]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[5]  Tim Beißbarth,et al.  Extending pathways based on gene lists using InterPro domain signatures , 2008, BMC Bioinformatics.

[6]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[7]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[8]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[9]  T. Jaakkola,et al.  Bayesian Network Approach to Cell Signaling Pathway Modeling , 2002, Science's STKE.

[10]  Satoru Miyano,et al.  Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection , 2003, ECCB.

[11]  Holger Fröhlich,et al.  GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products , 2007, BMC Bioinformatics.

[12]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[13]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[14]  Satoru Miyano,et al.  Estimation of Genetic Networks and Functional Structures Between Genes by Using Bayesian Networks and Nonparametric Regression , 2001, Pacific Symposium on Biocomputing.

[15]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[16]  Trey Ideker,et al.  Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power , 2011, Cell.

[17]  Alexandre P. Francisco,et al.  YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface , 2010, Nucleic Acids Res..

[18]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[19]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[21]  Holger Fröhlich,et al.  Predicting pathway membership via domain signatures , 2008, Bioinform..

[22]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[23]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[24]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[25]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[26]  Anil Wipat,et al.  Integration of Full-Coverage Probabilistic Functional Networks with Relevance to Specific Biological Processes , 2009, DILS.

[27]  Eyad Almasri,et al.  A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments , 2007, BMC Bioinformatics.

[28]  Teresa M. Przytycka,et al.  DOMINE: a database of protein domain interactions , 2007, Nucleic Acids Res..

[29]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[30]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[31]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[32]  Stefan Wiemann,et al.  KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor , 2009, Bioinform..

[33]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[34]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[35]  Darren J. Wilkinson,et al.  Bayesian integration of networks without gold standards , 2012, Bioinform..

[36]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[37]  Deborah Sanders,et al.  Computational Strategy for Discovering Druggable Gene Networks from Genome-Wide RNA Expression Profiles , 2005, Pacific Symposium on Biocomputing.

[38]  Holger Fröhlich,et al.  pathClass: an R-package for integration of pathway knowledge into support vector machines for biomarker discovery , 2011, Bioinform..

[39]  Satoru Miyano,et al.  Using Protein-Protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks , 2003, Pacific Symposium on Biocomputing.

[40]  Xujing Wang,et al.  Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data , 2011, BMC Bioinformatics.

[41]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[42]  Holger Fröhlich,et al.  Large scale statistical inference of signaling pathways from RNAi and microarray data , 2007, BMC Bioinformatics.

[43]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[44]  Akhilesh Pandey,et al.  Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology. , 2009, Methods in molecular biology.

[45]  Eyad Almasri,et al.  Incorporating Literature Knowledge in Bayesian Network for Inferring Gene Networks with Gene Expression Data , 2008, ISBRA.