A Bayes Random Field Approach for Integrative Large-Scale Regulatory Network Analysis

We present a Bayes-Random Fields framework which is capable of integrating unlimited data sources for discovering relevant network architecture of large-scale networks. The random field potential function is designed to impose a cluster constraint, teamed with a full Bayesian approach for incorporating heterogenous data sets. The probabilistic nature of our framework facilitates robust analysis in order to minimize the influence of noise inherent in the data on the inferred structure in a seamless and coherent manner. This is later proved in its applications to both large-scale synthetic data sets and Saccharomyces Cerevisiae data sets. The analytical and experimental results reveal the varied characteristic of different types of data and refelct their discriminative ability in terms of identifying direct gene interactions.

[1]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[2]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[3]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[4]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[5]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[6]  K Nasmyth,et al.  A general approach to the isolation of cell cycle-regulated genes in the budding yeast, Saccharomyces cerevisiae. , 1991, Journal of molecular biology.

[7]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[8]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  A. Maritan,et al.  Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns , 2006, Proceedings of the National Academy of Sciences.

[13]  Sophie Lèbre,et al.  Statistical Applications in Genetics and Molecular Biology Inferring Dynamic Genetic Networks with Low Order Independencies Inferring Dynamic Genetic Networks with Low Order Independencies ∗ , 2009 .

[14]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[15]  E. Gehan,et al.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data , 2008, Nature Reviews Cancer.

[16]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[17]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[18]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[19]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[21]  A. Tarantola Popper, Bayes and the inverse problem , 2006 .

[22]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[23]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[24]  Kathleen Marchal,et al.  Validating module network learning algorithms using simulated data , 2007, BMC Bioinformatics.

[25]  A. Zellner,et al.  Gibbs Sampler Convergence Criteria , 1995 .

[26]  Andrew B. Whitford Bayesian Methods: A Social and Behavioral Sciences Approach , 2003, Journal of Politics.

[27]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[28]  Satoru Miyano,et al.  Using Protein-Protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks , 2003, Pacific Symposium on Biocomputing.

[29]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[30]  R. Dahlhaus Graphical interaction models for multivariate time series1 , 2000 .

[31]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[32]  Eivind Hovig,et al.  MGraph: graphical models for microarray data analysis , 2003, Bioinform..

[33]  Ning Sun,et al.  Bayesian error analysis model for reconstructing transcriptional regulatory networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  A. Hartemink Reverse engineering gene regulatory networks , 2005, Nature Biotechnology.

[35]  Chang-Tsun Li,et al.  Partial mixture model for tight clustering of gene expression time-course , 2007, BMC Bioinformatics.

[36]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[37]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[38]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[39]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..