PSSM-based prediction of DNA binding sites in proteins

BackgroundDetection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.ResultsAn average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy.ConclusionOne problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets (sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.AvailabilityOnline predictions based on this method are available at http://www.netasa.org/dbs-pssm/

[1]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[2]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[3]  S. Selvaraj,et al.  Specificity of protein-DNA recognition revealed by structure-based potentials: symmetric/asymmetric and cognate/non-cognate binding. , 2002, Journal of molecular biology.

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  Aleksey Porollo,et al.  PROTEINS: Structure, Function, and Bioinformatics 56:753–767 (2004) Accurate Prediction of Solvent Accessibility Using Neural Networks–Based Regression , 2022 .

[6]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[7]  Akinori Sarai,et al.  Moment-based prediction of DNA-binding proteins. , 2004, Journal of molecular biology.

[8]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[9]  H. Margalit,et al.  Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[10]  C. Pabo,et al.  Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? , 2000, Journal of molecular biology.

[11]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[12]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  R. Apweiler Protein sequence databases. , 2000, Advances in protein chemistry.