论文信息 - PSSM-based prediction of DNA binding sites in proteins

PSSM-based prediction of DNA binding sites in proteins

BackgroundDetection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.ResultsAn average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy.ConclusionOne problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets (sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.AvailabilityOnline predictions based on this method are available at http://www.netasa.org/dbs-pssm/

Shandar Ahmad | Akinori Sarai | A. Sarai | Shandar Ahmad

[1] Shandar Ahmad,et al. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[2] D T Jones,et al. Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[3] S. Selvaraj,et al. Specificity of protein-DNA recognition revealed by structure-based potentials: symmetric/asymmetric and cognate/non-cognate binding. , 2002, Journal of molecular biology.

[4] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..

[5] Aleksey Porollo,et al. PROTEINS: Structure, Function, and Bioinformatics 56:753–767 (2004) Accurate Prediction of Solvent Accessibility Using Neural Networks–Based Regression , 2022 .

[6] Yael Mandel-Gutfreund,et al. Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[7] Akinori Sarai,et al. Moment-based prediction of DNA-binding proteins. , 2004, Journal of molecular biology.

[8] Janet M Thornton,et al. Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[9] H. Margalit,et al. Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. , 1998, Nucleic acids research.

[10] C. Pabo,et al. Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? , 2000, Journal of molecular biology.

[11] G J Barton,et al. Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[12] B. Rost,et al. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[13] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[14] R. Apweiler. Protein sequence databases. , 2000, Advances in protein chemistry.