Identification of ATP binding residues of a protein from its primary sequence

BackgroundOne of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction.ResultsWe have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP. This study describes few models that have been developed for identifying ATP interacting residues in a protein. All these models were trained and tested on 168 non-redundant ABPs chains. First we have developed a Support Vector Machine (SVM) based model using primary sequence of proteins and obtained maximum MCC 0.33 with accuracy of 66.25%. Secondly, another SVM based model was developed using position specific scoring matrix (PSSM) generated by PSI-BLAST. The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.ConclusionThis study demonstrates that it is possible to predict 'ATP interacting residues' in a protein with moderate accuracy using its sequence. The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence. This method will be useful for researchers studying ATP-binding proteins. Based on this study, a web server has been developed for predicting 'ATP interacting residues' in a protein http://www.imtech.res.in/raghava/atpint/.

[1]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[2]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[3]  Gajendra P. S. Raghava,et al.  A neural‐network based method for prediction of γ‐turns in proteins from multiple sequence alignment , 2003, Protein science : a publication of the Protein Society.

[4]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[5]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[6]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[7]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[8]  Stefan Günther,et al.  SuperSite: dictionary of metabolite and drug binding sites in proteins , 2008, Nucleic Acids Res..

[9]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[10]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[11]  G P S Raghava,et al.  A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment. , 2003, Protein science : a publication of the Protein Society.

[12]  G. Singh Prediction of-turns in proteins from multiple alignment using neural network , 2002 .

[13]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[14]  Nobutaka Hirokawa,et al.  Biochemical and molecular characterization of diseases linked to motor proteins. , 2003, Trends in biochemical sciences.

[15]  Satoru Miyano,et al.  A Weighted Profile Based Method for Protein-RNA Interacting Residue Prediction , 2006, Trans. Comp. Sys. Biology.

[16]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[17]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[18]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[19]  Satoru Miyano,et al.  A neural network method for identification of RNA-interacting residues in protein. , 2004, Genome informatics. International Conference on Genome Informatics.

[20]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[21]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[22]  G P S Raghava,et al.  Support vector machine based prediction of glutathione S-transferase proteins. , 2007, Protein and peptide letters.

[24]  Gajendra Pal Singh Raghava,et al.  Prediction of β‐turns in proteins from multiple alignment using neural network , 2003, Protein science : a publication of the Protein Society.

[25]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[26]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[29]  Nancy R Forde,et al.  Mechanical processes in biochemistry. , 2004, Annual review of biochemistry.

[30]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[31]  M. Kanehisa,et al.  Prediction of protein function from sequence properties. Discriminant analysis of a data base. , 1984, Biochimica et biophysica acta.

[32]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[33]  Ting Guo,et al.  A novel statistical ligand-binding site predictor: application to ATP-binding sites. , 2005, Protein engineering, design & selection : PEDS.

[34]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[35]  M. Go,et al.  An empirical approach for detecting nucleotide-binding sites on proteins. , 2006, Protein engineering, design & selection : PEDS.

[36]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[37]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .