Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods.

The zinc (Zn2+) cofactor has been proven to be involved in numerous biological mechanisms and the zinc-binding site is recognized as one of the most important post-translation modifications in proteins. Therefore, accurate knowledge of zinc ions in protein structures can provide potential clues for elucidation of protein folding and functions. However, determining zinc-binding residues by experimental means is usually lab-intensive and associated with high cost in most cases. In this context, the development of computational tools for identifying zinc-binding sites is highly desired, especially in the current post-genomic era. In this work, we developed a novel zinc-binding site prediction method by combining several intensively-trained machine learning models. To establish an accurate and generative method, we downloaded all zinc-binding proteins from the Protein Data Bank and prepared a non-redundant dataset. Meanwhile, a well-prepared dataset by other groups was also used. Then, effective and complementary features were extracted from sequences and three-dimensional structures of these proteins. Moreover, several well-designed machine learning models were intensively trained to construct accurate models. To assess the performance, the obtained predictors were stringently benchmarked using the diverse zinc-binding sites. Furthermore, several state-of-the-art in silico methods developed specifically for zinc-binding sites were also evaluated and compared. The results confirmed that our method is very competitive in real world applications and could become a complementary tool to wet lab experiments. To facilitate research in the community, a web server and stand-alone program implementing our method were constructed and are publicly available at . The downloadable program of our method can be easily used for the high-throughput screening of potential zinc-binding sites across proteomes.

[1]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[2]  Travis S. Hughes,et al.  Structural mechanism for signal transduction in RXR nuclear receptor heterodimers , 2015, Nature Communications.

[3]  E. Stadtman,et al.  Covalent modification reactions are marking steps in protein turnover. , 1990, Biochemistry.

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  Renxiang Yan,et al.  Prediction of outer membrane proteins by combining the position- and composition-based features of sequence profiles. , 2014, Molecular bioSystems.

[6]  Christos T. Chasapis,et al.  Zinc and human health: an update , 2012, Archives of Toxicology.

[7]  Tao Wang,et al.  Hepatic metal ion transporter ZIP8 regulates manganese homeostasis and manganese-dependent enzyme activity , 2017, The Journal of clinical investigation.

[8]  Nanjiang Shu,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm618 Sequence analysis Prediction of zinc-binding sites in proteins from sequence , 2008 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Kai Wang,et al.  Incorporating background frequency improves entropy-based residue conservation measures , 2006, BMC Bioinform..

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  P. Frasconi,et al.  Predicting zinc binding at the proteome level , 2007, BMC Bioinformatics.

[13]  H. Berman The Protein Data Bank: a historical perspective. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[14]  Ajay Kumar Saxena,et al.  Design of novel multi-epitope vaccines against severe acute respiratory syndrome validated through multistage molecular interaction and dynamics , 2019, Journal of biomolecular structure & dynamics.

[15]  Hans-Gerd Löhmannsröben,et al.  Preparation of patterned zinc oxide films by breath figure templating. , 2010, Langmuir : the ACS journal of surfaces and colloids.

[16]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[17]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[18]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[19]  Wei He,et al.  mFASD: a structure-based algorithm for discriminating different types of metal-binding sites , 2015, Bioinform..

[20]  Jiangning Song,et al.  An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins , 2012, PloS one.

[21]  K Michael Hambidge,et al.  Zinc deficiency: a special challenge. , 2007, The Journal of nutrition.

[22]  Andrew J. Bordner,et al.  Predicting small ligand binding sites in proteins using backbone structure , 2008, Bioinform..

[23]  Ziding Zhang,et al.  Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs , 2008, BMC Bioinformatics.

[24]  Chin-Sheng Yu,et al.  Prediction of Metal Ion–Binding Sites in Proteins Using the Fragment Transformation Method , 2012, PloS one.

[25]  A. Prasad,et al.  Zinc in Human Health: Effect of Zinc on Immune Cells , 2008, Molecular medicine.

[26]  Chin-Teng Lin,et al.  Protein Metal Binding Residue Prediction Based on Neural Networks , 2004, ICONIP.

[27]  David Baker,et al.  Prediction of structures of zinc‐binding proteins through explicit modeling of metal coordination geometry , 2010, Protein science : a publication of the Protein Society.

[28]  B. Vallee,et al.  Zinc: biochemistry, physiology, toxicology and clinical pathology. , 1988, BioFactors.

[29]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[30]  Oliver F. Lange,et al.  Structure prediction for CASP8 with all‐atom refinement using Rosetta , 2009, Proteins.

[31]  Nour Zahi Gammoh,et al.  Zinc and the immune system , 2000, Proceedings of the Nutrition Society.

[32]  Ling-Yun Wu,et al.  Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. , 2009, Protein engineering, design & selection : PEDS.

[33]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[34]  Masami Otsuka,et al.  Zinc binding site of HIV-2 Vpx prevents instability and dysfunction of the protein. , 2017, The Journal of general virology.

[35]  Marek Šebela,et al.  Structural and functional characterization of a plant S-nitrosoglutathione reductase from Solanum lycopersicum. , 2013, Biochimie.

[36]  Jiangning Song,et al.  ZincExplorer: an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences. , 2013, Molecular bioSystems.

[37]  Michel Schneider,et al.  The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. , 2009, Journal of proteomics.

[38]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[39]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[40]  Renxiang Yan,et al.  Prediction of structural features and application to outer membrane protein identification , 2015, Scientific Reports.

[41]  Kjetil Søreide,et al.  Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research , 2008, Journal of Clinical Pathology.

[42]  J. Skolnick,et al.  FINDSITE‐metal: Integrating evolutionary information and machine learning for structure‐based metal‐binding site prediction at the proteome level , 2011, Proteins.

[43]  B. Rost,et al.  Identifying cysteines and histidines in transition‐metal‐binding sites using support vector machines and neural networks , 2006, Proteins.