Determination of specificity influencing residues for key transcription factor families

Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (direct information, PSICOVand adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs for homeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at http://stormo.wustl.edu/SpecPred.

[1]  A S Lapedes,et al.  Superadditive correlation. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[2]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[3]  Brian W. Matthews,et al.  No code for recognition , 1988, Nature.

[4]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[5]  C. Pabo,et al.  DNA recognition by Cys2His2 zinc finger proteins. , 2000, Annual review of biophysics and biomolecular structure.

[6]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[7]  Saurabh Sinha,et al.  FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system , 2010, Nucleic Acids Res..

[8]  Mona Singh,et al.  De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins , 2013, Nucleic acids research.

[9]  G. Stormo,et al.  Sequence analysis Context-dependent DNA recognition code for C 2 H 2 zinc-finger transcription factors , 2008 .

[10]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[11]  Piotr J. Balwierz,et al.  ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs , 2014, Genome research.

[12]  Mona Singh,et al.  Predicting DNA recognition by Cys2His2 zinc finger proteins , 2009, Bioinform..

[13]  Nir Friedman,et al.  Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge , 2005, PLoS Comput. Biol..

[14]  Simona Cocco,et al.  From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction , 2012, PLoS Comput. Biol..

[15]  Gary D. Stormo,et al.  Program in Gene Function and Expression Publications and Presentations Program in Gene Function and Expression 4-2014 An improved predictive recognition model for Cys 2-His 2 zinc finger proteins , 2014 .

[16]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[17]  J. Joung,et al.  Locus-specific editing of histone modifications at endogenous enhancers using programmable TALE-LSD1 fusions , 2013, Nature Biotechnology.

[18]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[19]  Gary D. Stormo,et al.  Introduction to Protein-DNA Interactions: Structure, Thermodynamics, and Bioinformatics , 2013 .

[20]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[21]  Kevin Y. Yip,et al.  Understanding transcriptional regulation by integrative analysis of transcription factor binding data , 2012, Genome research.

[22]  Ting Wang,et al.  Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[24]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[25]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[26]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[27]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[28]  M. Spalding,et al.  TALEN-mediated genome editing: prospects and perspectives. , 2014, The Biochemical journal.

[29]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[30]  Gary D. Stormo,et al.  Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors , 2008, Bioinform..

[31]  Jamie H. D. Cate,et al.  Leveraging transcription factors to speed cellobiose fermentation by Saccharomyces cerevisiae , 2014, Biotechnology for Biofuels.

[32]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.

[33]  L. Xiangjun,et al.  ? Higher Education Press and Springer-Verlag 2007 , 2007 .

[34]  Clarence C. Y. Kwan A Regression-Based Interpretation of the Inverse of the Sample Covariance Matrix , 2014 .

[35]  Scot A. Wolfe,et al.  DNA RECOGNITION BY Cys 2 His 2 ZINC FINGER PROTEINS , 2000 .

[36]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[37]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[38]  Ezekiel J. Maier,et al.  Mapping functional transcription factor networks from gene expression data , 2013, Genome research.

[39]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[40]  Martha L. Bulyk,et al.  UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein–DNA interactions , 2010, Nucleic Acids Res..

[41]  William J. Riehl,et al.  RegPrecise 3.0 – A resource for genome-scale exploration of transcriptional regulation in bacteria , 2013, BMC Genomics.

[42]  Christopher Jarzynski,et al.  Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy , 2012, 1207.2484.

[43]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[44]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[45]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[46]  Gary D. Stormo,et al.  Recognition models to predict DNA-binding specificities of homeodomain proteins , 2012, Bioinform..