Structure-based de novo prediction of zinc-binding sites in proteins of unknown function

MOTIVATION Zinc-binding proteins are the most abundant metallo-proteins in Protein Data Bank (PDB). Accurate prediction of zinc-binding sites in proteins of unknown function may provide important clues for the inference of protein function. As zinc binding is often associated with characteristic 3D arrangements of zinc ligand residues, its prediction may benefit from using not only the sequence information but also the structure information of proteins. RESULTS In this work, we present a structure-based method, TEMSP (3D TEmplate-based Metal Site Prediction), to predict zinc-binding sites. TEMSP significantly improves over previously reported best methods in predicting as many as possible true ligand residues for zinc with minimum overpredictions: if only those results in which all zinc ligand residues have been correctly predicted are defined as true positives, our method improves sensitivity from less than 30% to above 60%, and selectivity from around 25% to 80%. These results are for predictions based on apo state structures. In addition, the method can predict the zinc-bound local structures reliably, generating predictions useful for function inference. We applied TEMSP to 1888 protein structures of the 'Unknown Function' class in the PDB database. A number of zinc-binding sites have been discovered de novo, i.e. based solely on the protein structures. Using the predicted local structures of these sites, possible functional roles were analyzed. AVAILABILITY TEMSP is freely available from http://netalign.ustc.edu.cn/temsp/.

[1]  Chin-Teng Lin,et al.  Protein Metal Binding Residue Prediction Based on Neural Networks , 2004, ICONIP.

[2]  B. Rost,et al.  Identifying cysteines and histidines in transition‐metal‐binding sites using support vector machines and neural networks , 2006, Proteins.

[3]  Shekhar C Mande,et al.  Exploiting 3D structural templates for detection of metal‐binding sites in protein structures , 2008, Proteins.

[4]  V. Sobolev,et al.  Prediction of transition metal‐binding sites from apo protein structures , 2007, Proteins.

[5]  Edward I. Solomon,et al.  Structural and Functional Aspects of Metal Sites in Biology. , 1996, Chemical reviews.

[6]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[7]  J. S. Sodhi,et al.  Predicting metal-binding site residues in low-resolution structural models. , 2004, Journal of molecular biology.

[8]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[9]  Paolo Frasconi,et al.  Merit Award Winners , 2004, BMC Bioinformatics.

[10]  Burkhard Rost,et al.  MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence , 2008, Bioinform..

[11]  R. Janowski,et al.  Bacterioferritin from Mycobacterium smegmatis contains zinc in its di‐nuclear site , 2008, Protein science : a publication of the Protein Society.

[12]  Andrew J. Bordner,et al.  Predicting small ligand binding sites in proteins using backbone structure , 2008, Bioinform..

[13]  K. Fukui,et al.  Crystal structure of Pyrococcus horikoshii PPC protein at 1.60 Å resolution , 2007, Proteins.

[14]  F. Gomis-Rüth,et al.  Structural aspects of the metzincin clan of metalloendopeptidases , 2003, Molecular biotechnology.

[15]  Joel P Mackay,et al.  Designed metal-binding sites in biomolecular and bioinorganic interactions. , 2008, Current opinion in structural biology.

[16]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[17]  Antonio Rosato,et al.  A hint to search for metalloproteins in gene banks , 2004, Bioinform..

[18]  L. Serrano,et al.  Prediction of water and metal binding sites and their affinities by using the Fold-X force field. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Lianyi Han,et al.  Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach , 2006, BMC Bioinformatics.

[20]  Antonio Rosato,et al.  Metalloproteomes: a bioinformatic approach. , 2009, Accounts of chemical research.

[21]  Jessica C. Ebert,et al.  Robust recognition of zinc binding sites in proteins , 2007, Protein science : a publication of the Protein Society.

[22]  Nanjiang Shu,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm618 Sequence analysis Prediction of zinc-binding sites in proteins from sequence , 2008 .

[23]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[24]  Toshiyuki Yamamoto,et al.  CONFLICT OF INTEREST: None declared. , 2013 .

[25]  A. Rosato,et al.  The annotation of full zinc proteomes , 2010, JBIC Journal of Biological Inorganic Chemistry.

[26]  Antonio Rosato,et al.  Counting the zinc-proteins encoded in the human genome. , 2006, Journal of proteome research.

[27]  Ronen Levy,et al.  Prediction of 3D metal binding sites from translated gene sequences based on remote‐homology templates , 2009, Proteins.