A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays

Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

[1]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[2]  William Stafford Noble,et al.  High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions , 2010, PLoS Comput. Biol..

[3]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[4]  Alistair G. Rust,et al.  Role of the transcription factor C/EBPδ in a regulatory circuit that discriminates between transient and persistent Toll-like receptor 4-induced signals , 2009, Nature Immunology.

[5]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[6]  V. Thorsson,et al.  A Data Integration Framework for Prediction of Transcription Factor Targets , 2009, Annals of the New York Academy of Sciences.

[7]  M. Berger,et al.  Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors , 2009, Nature Protocols.

[8]  H. Lähdesmäki,et al.  Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources , 2008, PloS one.

[9]  Xiaoyu Chen,et al.  RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors , 2007, ISMB/ECCB.

[10]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[11]  Anthony A. Philippakis,et al.  Design of Compact, Universal DNA Microarrays for Protein Binding Microarray Experiments , 2007, RECOMB.

[12]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[13]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[14]  Michael B. Eisen,et al.  Design of a combinatorial DNA microarray for protein-DNA interaction studies , 2006, BMC Bioinformatics.

[15]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[16]  Amos Tanay,et al.  Extensive low-affinity transcriptional interactions in the yeast genome. , 2006, Genome research.

[17]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[18]  Trey Ideker,et al.  Integrated Assessment and Prediction of Transcription Factor Binding , 2006, PLoS Comput. Biol..

[19]  A. Seth,et al.  ETS transcription factors and their emerging roles in human cancer. , 2005, European journal of cancer.

[20]  N. D. Clarke,et al.  DIP-chip: rapid and accurate determination of DNA-binding specificity. , 2005, Genome research.

[21]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[22]  C. Elkan,et al.  Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization , 2004, Machine Learning.

[23]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[24]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[25]  J. Darnell Transcription factors as targets for cancer therapy , 2002, Nature Reviews Cancer.

[26]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[27]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[28]  G. Church,et al.  Exploring the DNA-binding specificities of zinc fingers with DNA microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[30]  V. Orlando,et al.  Mapping chromosomal proteins in vivo by formaldehyde-crosslinked-chromatin immunoprecipitation. , 2000, Trends in biochemical sciences.

[31]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[32]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[33]  T. Gilmore,et al.  Control of apoptosis by Rel/NF-κB transcription factors , 1999, Oncogene.

[34]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[35]  D. Latchman Transcription factors: an overview. , 1997, The international journal of biochemistry & cell biology.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[38]  R. Sauer,et al.  Transcription factors: structural families and principles of DNA recognition. , 1992, Annual review of biochemistry.

[39]  M. Rosenfeld,et al.  The POU-specific domain of Pit-1 is essential for sequence-specific, high affinity DNA binding and DNA-dependent Pit-1—Pit-1 interactions , 1990, Cell.

[40]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[41]  M. Solomon,et al.  Formaldehyde-mediated DNA-protein crosslinking: a probe for in vivo chromatin structures. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[42]  H. Zachau,et al.  Correct transcription of an immunoglobulin κ gene requires an upstream fragment containing conserved sequence elements , 1984, Nature.