Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE

MOTIVATION Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by a statistical mechanical model. Based on this model, we developed the MatrixREDUCE algorithm, which uses genome-wide occupancy data for a transcription factor (e.g. ChIP-chip) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. Advantages of our approach are that the information for all probes on the microarray is efficiently utilized because there is no need to delineate "bound" and "unbound" sequences, and that, unlike information content-based methods, it does not require a background sequence model. RESULTS We validated the performance of MatrixREDUCE by inferring the sequence-specific binding affinities for several transcription factors in S. cerevisiae and comparing the results with three other independent sources of transcription factor sequence-specific affinity information: (i) experimental measurement of transcription factor binding affinities for specific oligonucleotides, (ii) reporter gene assays for promoters with systematically mutated binding sites, and (iii) relative binding affinities obtained by modeling transcription factor-DNA interactions based on co-crystal structures of transcription factors bound to DNA substrates. We show that transcription factor binding affinities inferred by MatrixREDUCE are in good agreement with all three validating methods. AVAILABILITY MatrixREDUCE source code is freely available for non-commercial use at http://www.bussemakerlab.org/. The software runs on Linux, Unix, and Mac OS X.

[1]  K. Struhl,et al.  The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted α Helices: Crystal structure of the protein-DNA complex , 1992, Cell.

[2]  Anirvan M. Sengupta,et al.  A biophysical approach to transcription factor binding site discovery. , 2003, Genome research.

[3]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[4]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[5]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[6]  P. S. Kim,et al.  X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. , 1991, Science.

[7]  E. Serra,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association , 2001, Nature Genetics.

[8]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[9]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[10]  N. D. Clarke,et al.  Rationalization of gene regulation by a eukaryotic transcription factor: calculation of regulatory region occupancy from predicted binding affinities. , 2002, Journal of molecular biology.

[11]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[12]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[13]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[14]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[15]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[16]  N. D. Clarke,et al.  DIP-chip: rapid and accurate determination of DNA-binding specificity. , 2005, Genome research.

[17]  W. Olson,et al.  3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. , 2003, Nucleic acids research.

[18]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.

[19]  K. Benjamin,et al.  Sum1 and Ndt80 Proteins Compete for Binding to Middle Sporulation Element Sequences That Control Meiotic Gene Expression , 2003, Molecular and Cellular Biology.

[20]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[21]  N. D. Clarke,et al.  Explicit equilibrium modeling of transcription-factor binding and gene regulation , 2005, Genome Biology.

[22]  Guillaume Paillard,et al.  Analyzing protein-DNA recognition mechanisms. , 2004, Structure.

[23]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[24]  Barrett C. Foat,et al.  Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Cynthia Wu,et al.  Structure of the sporulation‐specific transcription factor Ndt80 bound to DNA , 2002, The EMBO journal.

[26]  Anirvan M. Sengupta,et al.  Quantitative modeling and data analysis of SELEX experiments , 2005, Physical biology.

[27]  Peter König,et al.  The Crystal Structure of the DNA-Binding Domain of Yeast RAP1 in Complex with Telomeric DNA , 1996, Cell.

[28]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[29]  N. Wingreen,et al.  Toward an atomistic model for predicting transcription‐factor binding sites , 2004, Proteins.

[30]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[31]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[32]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[33]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  D. Baker,et al.  A simple physical model for the prediction and design of protein-DNA interactions. , 2004, Journal of molecular biology.

[35]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[36]  D. Crothers,et al.  Equilibria and kinetics of lac repressor-operator interactions by polyacrylamide gel electrophoresis. , 1981, Nucleic acids research.

[37]  D. Case,et al.  Exploring protein native states and large‐scale conformational changes with a modified generalized born model , 2004, Proteins.

[38]  A. Vershon,et al.  Participation of the yeast activator Abf1 in meiosis-specific expression of the HOP1 gene , 1996, Molecular and cellular biology.

[39]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[40]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.