PiDNA: predicting protein–DNA interactions with structural models

Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.

[1]  Yaoqi Zhou,et al.  Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function , 2010, Bioinform..

[2]  Bruno Contreras-Moreira,et al.  3D-footprint: a database for the structural analysis of protein–DNA complexes , 2009, Nucleic Acids Res..

[3]  Seren Soner,et al.  DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues , 2010, Nucleic Acids Res..

[4]  Julio Collado-Vides,et al.  Prediction of TF target sites based on atomistic models of protein-DNA complexes , 2008, BMC Bioinformatics.

[5]  Alexander E. Kel,et al.  Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies , 2010, BMC Bioinformatics.

[6]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[7]  Ying Xu,et al.  Structure‐based prediction of transcription factor binding sites using a protein‐DNA docking approach , 2008, Proteins.

[8]  Gabriele Varani,et al.  An all‐atom, distance‐dependent scoring function for the prediction of protein–DNA interactions from structure , 2006, Proteins.

[9]  Guohui Li,et al.  A Structural-Based Strategy for Recognition of Transcription Factor Binding Sites , 2013, PloS one.

[10]  Hanah Margalit,et al.  A Structure-Based Approach for Prediction of Protein Binding Sites in Gene-Upstream Regions , 2000, Pacific Symposium on Biocomputing.

[11]  Wen-Hsiung Li,et al.  MYBS: a comprehensive web server for mining transcription factor binding sites in yeast , 2007, Nucleic Acids Res..

[12]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[13]  Nicholas M. Luscombe,et al.  Amino acid?base interactions: a three-dimensional analysis of protein?DNA interactions at an atomic level , 2001, Nucleic Acids Res..

[14]  Darby Tien-Hao Chang,et al.  DBD2BS: connecting a DNA-binding protein with its binding sites , 2012, Nucleic Acids Res..

[15]  Takako Takeda,et al.  A knowledge-based orientation potential for transcription factor-DNA docking , 2013, Bioinform..

[16]  David Baker,et al.  Assessment of the optimization of affinity and specificity at protein–DNA interfaces , 2009, Nucleic acids research.

[17]  Alexander E. Kel,et al.  3DTF: a web server for predicting transcription factor PWMs using 3D structure-based energy calculations , 2012, Nucleic Acids Res..

[18]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[19]  Jeffrey Skolnick,et al.  From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions , 2009, PLoS Comput. Biol..

[20]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[21]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[22]  Pinak Chakrabarti,et al.  Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters , 2012, Nucleic acids research.

[23]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[24]  Harianto Tjong,et al.  DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces , 2007, Nucleic acids research.

[25]  Darby Tien-Hao Chang,et al.  Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures , 2012, PloS one.

[26]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[27]  Antonina Silkov,et al.  Structural alignment of protein--DNA interfaces: insights into the determinants of binding specificity. , 2005, Journal of molecular biology.

[28]  Christina S. Leslie,et al.  iDBPs: a web server for the identification of DNA binding proteins , 2010, Bioinform..

[29]  Tao Li,et al.  PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information , 2013, Bioinform..

[30]  Hong Yan,et al.  A discriminatory function for prediction of protein-DNA interactions based on alpha shape modeling , 2010, Bioinform..