Identification of Ca2+-binding residues of a protein from its primary sequence

Calcium is one of the most abundant minerals in the human body, playing a critical role in many cellular activities by interacting with different calcium ion (Ca(2+))-binding proteins. Therefore, the correct identification of Ca(2+)-binding residues is essential for protein functional research. In this study, a new method was developed to predict Ca2+-binding residues from the primary sequence without using three-dimensional information. Through statistical analysis, four kinds of feature parameters were extracted from amino acid sequences: the increment of diversity values of amino acid composition, the matrix scoring values of position conservation, the autocross covariance of physicochemical properties, and the center motif. These features served as input for a support vector machine to predict Ca(2+)-binding residues. This method was tested on four well-established datasets using a five-fold cross-validation. The accuracies and Matthews correlation coefficients were 75.9% and 0.53 (dataset 1), 79.2% and 0.58 (dataset 2), 77.4% and 0.55 (dataset 3), and 79.1% and 0.58 (dataset 4). Comparative results show that the developed method outperforms previous methods. Based on this study, a web server was developed for predicting Ca(2+)-binding residues from any protein sequence, being publically available at http://202.207.29.245/.

[1]  Yang Zhang,et al.  Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement. , 2012, Structure.

[2]  Torsten Schwede,et al.  Assessment of ligand‐binding residue predictions in CASP9 , 2011, Proteins.

[3]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[4]  D Eisenberg,et al.  Where metal ions bind in proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Gajendra P. S. Raghava,et al.  Identification of ATP binding residues of a protein from its primary sequence , 2009, BMC Bioinformatics.

[6]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[7]  Szymon M. Kielbasa,et al.  Measuring similarities between transcription factor binding sites , 2005, BMC Bioinformatics.

[8]  Chin-Sheng Yu,et al.  Prediction of Metal Ion–Binding Sites in Proteins Using the Fragment Transformation Method , 2012, PloS one.

[9]  Gajendra P. S. Raghava,et al.  Open Access Research Article Prediction of Gtp Interacting Residues, Dipeptides and Tripeptides in a Protein from Its Evolutionary Information , 2022 .

[10]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[11]  Q. Z. Li,et al.  The prediction of the structural class of protein: application of the measure of diversity. , 2001, Journal of theoretical biology.

[12]  Xiuzhen Hu,et al.  Recognition of 27-Class Protein Folds by Adding the Interaction of Segments and Motif Information , 2014, BioMed research international.

[13]  Wei Yang,et al.  Predicting calcium‐binding sites in proteins—A graph theory and geometry approach , 2006, Proteins.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Kun Zhao,et al.  Integration of Diverse Research Methods to Analyze and Engineer Ca-Binding Proteins: From Prediction to Production. , 2010, Current bioinformatics.

[16]  Jenny J. Yang,et al.  Calciomics: integrative studies of Ca2+-binding proteins and their interactomes in biological systems. , 2013, Metallomics : integrated biometal science.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Jun Hu,et al.  Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble , 2014, BMC Bioinformatics.

[19]  Chin-Teng Lin,et al.  Protein Metal Binding Residue Prediction Based on Neural Networks , 2004, ICONIP.

[20]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[21]  Ram Samudrala,et al.  A protein sequence meta-functional signature for calcium binding residue prediction , 2010, Pattern Recognit. Lett..

[22]  J Moult,et al.  A model for the Ca2+-induced conformational transition of troponin C. A trigger for muscle contraction. , 1986, The Journal of biological chemistry.

[23]  Shuigeng Zhou,et al.  Prediction of protein-protein interaction sites using an ensemble method , 2009, BMC Bioinformatics.

[24]  Qian-Zhong Li,et al.  Recognition of β-hairpin motifs in proteins by using the composite vector , 2009, Amino Acids.

[25]  R. Laxton The measure of diversity. , 1978, Journal of theoretical biology.

[26]  Haixia Long,et al.  Prediction β-hairpin motifs in enzyme protein using three methods , 2012, 2012 8th International Conference on Natural Computation.

[27]  Gajendra P. S. Raghava,et al.  Identification of NAD interacting residues in proteins , 2010, BMC Bioinformatics.

[28]  Akhilesh Pandey,et al.  From biological databases to platforms for biomedical discovery. , 2003, Trends in biotechnology.

[29]  Roberto Ravazzolo,et al.  TMEM16A, A Membrane Protein Associated with Calcium-Dependent Chloride Channel Activity , 2008, Science.