Novel Data Fusion Method and Exploration of Multiple Information Sources for Transcription Factor Target Gene Prediction

Background. Revealing protein-DNA interactions is a key problem in understanding transcriptional regulation at mechanistic level. Computational methods have an important role in predicting transcription factor target gene genomewide. Multiple data fusion provides a natural way to improve transcription factor target gene predictions because sequence specificities alone are not sufficient to accurately predict transcription factor binding sites. Methods. Here we develop a new data fusion method to combine multiple genome-level data sources and study the extent to which DNA duplex stability and nucleosome positioning information, either alone or in combination with other data sources, can improve the prediction of transcription factor target gene. Results. Results on a carefully constructed test set of verified binding sites in mouse genome demonstrate that our new multiple data fusion method can reduce false positive rates, and that DNA duplex stability and nucleosome occupation data can improve the accuracy of transcription factor target gene predictions, especially when combined with other genome-level data sources. Cross-validation and other randomization tests confirm the predictive performance of our method. Our results also show that nonredundant data sources provide the most efficient data fusion.

[1]  Dima Suki,et al.  Loss of the AP-2alpha transcription factor is associated with the grade of human gliomas. , 2005, Clinical cancer research : an official journal of the American Association for Cancer Research.

[2]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[3]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[4]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[5]  R. Holmes,et al.  Crystal structure of an IdeR-DNA complex reveals a conformational change in activated IdeR for base-specific interactions. , 2004, Journal of molecular biology.

[6]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[7]  Alexander J. Hartemink,et al.  A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast , 2007, PLoS Comput. Biol..

[8]  O. Wrange,et al.  Translational positioning of a nucleosomal glucocorticoid response element modulates glucocorticoid receptor affinity. , 1993, Genes & development.

[9]  Enrique Blanco,et al.  ABS: a database of Annotated regulatory Binding Sites from orthologous promoters , 2005, Nucleic Acids Res..

[10]  Ernest Fraenkel,et al.  High-resolution computational models of genome binding events , 2006, Nature Biotechnology.

[11]  K. Davies,et al.  The role of basal and myogenic factors in the transcriptional activation of utrophin promoter A: implications for therapeutic up-regulation in Duchenne muscular dystrophy. , 2001, Nucleic acids research.

[12]  Chengpeng Bi,et al.  WebSIDD: server for predicting stress-induced duplex destabilized (SIDD) sites in superhelical DNA. , 2004, Bioinformatics.

[13]  D. Nathans,et al.  DNA binding site of the growth factor-inducible protein Zif268. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[14]  H. Lähdesmäki,et al.  Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources , 2008, PloS one.

[15]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[16]  M. Rudnicki,et al.  The molecular regulation of myogenesis , 2000, Clinical genetics.

[17]  D. Ollis,et al.  Structural basis of protein-nucleic acid interactions , 1987 .

[18]  Guo-Cheng Yuan,et al.  Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion , 2007, PLoS Comput. Biol..

[19]  Obi L. Griffith,et al.  ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation , 2006, Bioinform..

[20]  Alexander J. Hartemink,et al.  Using DNA Duplex Stability Information for Transcription Factor Binding Site Discovery , 2008, Pacific Symposium on Biocomputing.

[21]  J. Lieb,et al.  Evidence for nucleosome depletion at active regulatory regions genome-wide , 2004, Nature Genetics.

[22]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[23]  J. Workman,et al.  A histone‐binding protein, nucleoplasmin, stimulates transcription factor binding to nucleosomes and factor‐induced nucleosome disassembly. , 1994, The EMBO journal.

[24]  Francesca Chiaromonte,et al.  ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. , 2006, Genome research.

[25]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[26]  L M Finocchiaro,et al.  Single strand binding protein specific for the polyoma early-coding strand of PEA1 (AP1) regulatory sequence. , 1991, Nucleic acids research.

[27]  Ernest Fraenkel,et al.  Practical Strategies for Discovering Regulatory DNA Sequence Motifs , 2006, PLoS Comput. Biol..

[28]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[29]  Chengpeng Bi,et al.  The Analysis of Stress-Induced Duplex Destabilization in Long Genomic DNA Sequences , 2004, J. Comput. Biol..

[30]  K Walsh,et al.  MyoD binds to the guanine tetrad nucleic acid structure. , 1992, The Journal of biological chemistry.

[31]  D. Levens,et al.  A sequence-specific, single-strand binding protein activates the far upstream element of c-myc and defines a new DNA-binding motif. , 1994, Genes & development.