Protein Metal Binding Residue Prediction Based on Neural Networks

Over one-third of protein structures contain metal ions, which are the necessary elements in life systems. Traditionally, structural biologists were used to investigate properties of metalloproteins (proteins which bind with metal ions) by physical means and interpreting the function formation and reaction mechanism of enzyme by their structures and observations from experiments in vitro. Most of proteins have primary structures (amino acid sequence information) only; however, the 3-dimension structures are not always available. In this paper, a direct analysis method is proposed to predict the protein metal-binding amino acid residues from its sequence information only by neural networks with sliding window-based feature extraction and biological feature encoding techniques. In four major bulk elements (Calcium, Potassium, Magnesium, and Sodium), the metal-binding residues are identified by the proposed method with higher than 90% sensitivity and very good accuracy under 5-fold cross validation. With such promising results, it can be extended and used as a powerful methodology for metal-binding characterization from rapidly increasing protein sequences in the future.

[1]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[2]  Jaime Prilusky,et al.  SPACE: a suite of tools for protein structure prediction and analysis based on complementarity and environment , 2005, Nucleic Acids Res..

[3]  Melissa S. Cline,et al.  Predicting reliable regions in protein sequence alignments , 2002, Bioinform..

[4]  Terry K. Smith,et al.  The N-Acetyl-D-glucosaminylphosphatidylinositol De-N-acetylase of Glycosylphosphatidylinositol Biosynthesis Is a Zinc Metalloenzyme* , 2005, Journal of Biological Chemistry.

[5]  B. Vallee,et al.  Zinc coordination, function, and structure of zinc enzymes and other proteins. , 1990, Biochemistry.

[6]  P. Bucher,et al.  Experimental analysis and computer prediction of CTF/NFI transcription factor DNA binding sites. , 2000, Journal of molecular biology.

[7]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[8]  Charles W. Bock,et al.  Manganese as a Replacement for Magnesium and Zinc: Functional Comparison of the Divalent Ions , 1999 .

[9]  J. S. Sodhi,et al.  Predicting metal-binding site residues in low-resolution structural models. , 2004, Journal of molecular biology.

[10]  A. D'arcy,et al.  The 1.15A crystal structure of the Staphylococcus aureus methionyl-aminopeptidase and complexes with triazole based inhibitors. , 2003, Journal of molecular biology.

[11]  D. Eisenberg,et al.  Hydrophobic moments and protein structure , 1982 .

[12]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[13]  Antonio Rosato,et al.  Bioinorganic chemistry in the postgenomic era , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Sabine Dietmann,et al.  Prediction of 3D neighbours of molecular surface patches in proteins by artificial neural networks , 2002, Bioinform..

[15]  C. Sander,et al.  The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value , 1996, Comput. Appl. Biosci..

[16]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[17]  Kirill Degtyarenko,et al.  Bioinorganic motifs: towards functional classification of metalloproteins , 2000, Bioinform..

[18]  M. Ehrmann,et al.  Periplasmic glycerophosphodiester phosphodiesterase of Escherichia coli, a new enzyme of the glp regulon. , 1983, The Journal of biological chemistry.

[19]  S H Kim,et al.  Crystal structure and mechanism of catalysis of a pyrazinamidase from Pyrococcus horikoshii. , 2001, Biochemistry.

[20]  Charles S Bond,et al.  Structure of 2C-methyl-d-erythritol 2,4- cyclodiphosphate synthase: An essential enzyme for isoprenoid biosynthesis and target for antimicrobial drug development , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Antonio Rosato,et al.  A hint to search for metalloproteins in gene banks , 2004, Bioinform..

[22]  J. C. Evans,et al.  Betaine-homocysteine methyltransferase: zinc in a distorted barrel. , 2002, Structure.

[23]  G. Izmirlian,et al.  Overview of Commonly Used Bioinformatics Methods and Their Applications , 2004, Annals of the New York Academy of Sciences.

[24]  L. Serrano,et al.  Prediction of water and metal binding sites and their affinities by using the Fold-X force field. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J R Coggins,et al.  The 3-dehydroquinate synthase activity of the pentafunctional arom enzyme complex of Neurospora crassa is Zn2+-dependent. , 1985, The Biochemical journal.

[26]  Adel Golovin,et al.  MSDsite: A database search and retrieval system for the analysis and viewing of bound ligands and active sites , 2004, Proteins.

[27]  Takayuki Kazuoka,et al.  D-arginase of Arthrobacter sp. KUJ 8602: characterization and its identity with Zn(2+)-guanidinobutyrase. , 2003, Journal of biochemistry.

[28]  Zbigniew Dauter,et al.  The crystal structure of the reduced, Zn2+-bound form of the B. subtilis Hsp33 chaperone and its implications for the activation mechanism. , 2004, Structure.

[29]  C. Chothia,et al.  Hydrophobic bonding and accessible surface area in proteins , 1974, Nature.

[30]  Minko Dudev,et al.  First-second shell interactions in metal binding sites in proteins: a PDB survey and DFT/CDM calculations. , 2003, Journal of the American Chemical Society.

[31]  D. Eisenberg Proteins. Structures and molecular properties, T.E. Creighton. W. H. Freeman and Company, New York (1984), 515, $36.95 , 1985 .

[32]  K Nadassy,et al.  Analysis of zinc binding sites in protein crystal structures , 1998, Protein science : a publication of the Protein Society.

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  V. Sobolev,et al.  Flexibility of metal binding sites in proteins on a database scale , 2005, Proteins.

[35]  P. Argos,et al.  Suggestions for "safe" residue substitutions in site-directed mutagenesis. , 1991, Journal of molecular biology.

[36]  Robert A. Copeland,et al.  Enzymes: A Practical Introduction to Structure, Mechanism, and Data Analysis , 1996 .

[37]  Kazuhiko Ishikawa,et al.  Novel Bifunctional Hyperthermostable Carboxypeptidase/Aminoacylase from Pyrococcus horikoshii OT3 , 2001, Applied and Environmental Microbiology.

[38]  Robert A. Copeland,et al.  Structural Components of Enzymes , 2002 .

[39]  Myriam Ferro,et al.  Identification and characterization of plant glycerophosphodiester phosphodiesterase , 2022 .

[40]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[41]  Adam Godzik,et al.  New avenues in protein function prediction , 2006, Protein science : a publication of the Protein Society.

[42]  John A. Tainer,et al.  MDB: the Metalloprotein Database and Browser at The Scripps Research Institute , 2002, Nucleic Acids Res..

[43]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[44]  Visvaldas Kairys,et al.  SitCon: Binding Site Residue Conservation Visualization and Protein Sequence-to-Function Tool , 2007 .

[45]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[46]  William R. Taylor,et al.  Structure Motif Discovery and Mining the PDB , 2002, German Conference on Bioinformatics.

[47]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[49]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[50]  B. Rost,et al.  Identifying cysteines and histidines in transition‐metal‐binding sites using support vector machines and neural networks , 2006, Proteins.

[51]  Cathy H. Wu,et al.  Neural networks and genome informatics , 2000 .

[52]  Jing Wu,et al.  Thermotoga maritima 3-Deoxy-D-arabino-heptulosonate 7-Phosphate (DAHP) Synthase , 2003, Journal of Biological Chemistry.

[53]  L. Mirny,et al.  Evolutionary conservation of the folding nucleus. , 2000, Journal of molecular biology.

[54]  Antonio Rosato,et al.  Counting the zinc-proteins encoded in the human genome. , 2006, Journal of proteome research.

[55]  Chin-Teng Lin,et al.  Protein Metal Binding Residue Prediction Based on Neural Networks , 2004, ICONIP.

[56]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[57]  Matthew H. J. Cordes,et al.  Lateral gene transfer of a dermonecrotic toxin between spiders and bacteria , 2006, Bioinform..

[58]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[59]  Li Wang,et al.  A computational pipeline for protein structure prediction and analysis at genome scale , 2003, Bioinform..

[60]  B. Bartel,et al.  Characterization of a Family of IAA-Amino Acid Conjugate Hydrolases from Arabidopsis* , 2002, The Journal of Biological Chemistry.

[61]  R. Arni,et al.  Structural Basis for Metal Ion Coordination and the Catalytic Mechanism of Sphingomyelinases D* , 2005, Journal of Biological Chemistry.

[62]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[63]  J. Thornton,et al.  The application of hydrogen bonding analysis in X-ray crystallography to help orientate asparagine, glutamine and histidine side chains. , 1995, Protein engineering.

[64]  A. Bondi van der Waals Volumes and Radii , 1964 .

[65]  J. Villafranca,et al.  Characterization of metal ion activation and inhibition of CTP synthetase. , 1993, Biochemistry.

[66]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[67]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[68]  A. H. Wang,et al.  Crystal structure of D-aminoacylase from Alcaligenes faecalis DA1. A novel subset of amidohydrolases and insights into the enzyme mechanism. , 2003, The Journal of biological chemistry.

[69]  Ralph Kirby,et al.  The Functional Role of the Binuclear Metal Center in d-Aminoacylase , 2004, Journal of Biological Chemistry.

[70]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[71]  Michael Sullivan,et al.  Metalloproteomics: high-throughput structural and functional annotation of proteins in structural genomics. , 2005, Structure.

[72]  D. Auld Zinc coordination sphere in biochemical zinc sites , 2001, Biometals.

[73]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.