Bayesian Joint Modeling of Multiple Gene Networks and Diverse Genomic Data to Identify Target Genes of a Transcription Factor.

We consider integrative modeling of multiple gene networks and diverse genomic data, including protein-DNA binding, gene expression and DNA sequence data, to accurately identify the regulatory target genes of a transcription factor (TF). Rather than treating all the genes equally and independently a priori in existing joint modeling approaches, we incorporate the biological prior knowledge that neighboring genes on a gene network tend to be (or not to be) regulated together by a TF. A key contribution of our work is that, to maximize the use of all existing biological knowledge, we allow incorporation of multiple gene networks into joint modeling of genomic data by introducing a mixture model based on the use of multiple Markov random fields (MRFs). Another important contribution of our work is to allow different genomic data to be correlated and to examine the validity and effect of the independence assumption as adopted in existing methods. Due to a fully Bayesian approach, inference about model parameters can be carried out based on MCMC samples. Application to an E. coli data set, together with simulation studies, demonstrates the utility and statistical efficiency gains with the proposed joint model.

[1]  David J. Spiegelhalter,et al.  WinBUGS user manual version 1.4 , 2003 .

[2]  P. Rice,et al.  Structure of the LexA-DNA complex and implications for SOS box measurement , 2010, Nature.

[3]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[4]  Hongzhe Li,et al.  A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data , 2008, 0803.3942.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  S. Busby,et al.  The bacterial LexA transcriptional repressor , 2008, Cellular and Molecular Life Sciences.

[7]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[8]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[9]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[10]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[11]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[12]  Wei Pan,et al.  A Bayesian approach to joint modeling of protein–DNA binding, gene expression and sequence data , 2010, Statistics in medicine.

[13]  Natalie Wilson,et al.  Human Protein Reference Database , 2004, Nature Reviews Molecular Cell Biology.

[14]  Wei Pan,et al.  Network‐based genomic discovery: application and comparison of Markov random‐field models , 2010, Journal of the Royal Statistical Society. Series C, Applied statistics.

[15]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Gerhard Winkler,et al.  Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction , 2002 .

[17]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[18]  Wei Pan,et al.  Incorporating Gene Functions into Regression Analysis of DNA-Protein Binding Data and Gene Expression Data to Construct Transcriptional Networks , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[20]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[21]  Emmitt R. Jolly,et al.  Inference of combinatorial regulation in yeast transcriptional networks: a case study of sporulation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Z. Q. John Lu,et al.  Bayesian methods for data analysis, third edition , 2010 .

[23]  Kevin Struhl,et al.  Genomic analysis of LexA binding reveals the permissive nature of the Escherichia coli genome and identifies unconventional target sites. , 2005, Genes & development.

[24]  L. McCandless Bayesian methods for data analysis (3rd edn). Bradley P. Carlin and Thomas A. Louis, Chapman & Hall/CRC, Boca Raton, 2008. No. of pages: 552. Price: $69.95. ISBN 9781584886976 , 2009 .

[25]  Bradley P. Carlin,et al.  Bayesian Methods for Data Analysis , 2008 .

[26]  Wei Pan,et al.  A Parametric Joint Model of DNA-Protein Binding, Gene Expression and DNA Sequence Data to Detect Target Genes of a Transcription Factor , 2007, Pacific Symposium on Biocomputing.

[27]  D. M. Titterington,et al.  Computational Bayesian Analysis of Hidden Markov Models , 1998 .

[28]  N. G. Best,et al.  WinBUGS User Manual: Version 1.4 , 2001 .

[29]  Christian J. Stoeckert,et al.  Bayesian variable selection and data integration for biological regulatory networks , 2006, math/0610034.

[30]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[31]  Ning Sun,et al.  Bayesian error analysis model for reconstructing transcriptional regulatory networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  B. Michel,et al.  After 30 Years of Study, the Bacterial SOS Response Still Surprises Us , 2005, PLoS biology.

[33]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[34]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[35]  Natalie Wilson Human Protein Reference Database , 2004, Nature Reviews Genetics.

[36]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  Floyd E Romesberg,et al.  Inhibition of Mutation and Combating the Evolution of Antibiotic Resistance , 2005, PLoS biology.

[39]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[40]  Judy H. Cho,et al.  Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies , 2011, PLoS genetics.

[41]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[42]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[43]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[44]  J. Courcelle,et al.  Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia coli. , 2001, Genetics.