Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins.

The number of protein and peptide structures included in Protein Data Bank (PDB) and Gen Bank without functional annotation has increased. Consequently, there is a high demand for theoretical models to predict these functions. Here, we trained and validated, with an external set, a Markov Chain Model (MCM) that classifies proteins by their possible mechanism of action according to Enzyme Classification (EC) number. The methodology proposed is essentially new, and enables prediction of all EC classes with a single equation without the need for an equation for each class or nonlinear models with multiple outputs. In addition, the model may be used to predict whether one peptide presents a positive or negative contribution of the activity of the same EC class. The model predicts the first EC number for 106 out of 151 (70.2%) oxidoreductases, 178/178 (100%) transferases, 223/223 (100%) hydrolases, 64/85 (75.3%) lyases, 74/74 (100%) isomerases, and 100/100 (100%) ligases, as well as 745/811 (91.9%) nonenzymes. It is important to underline that this method may help us predict new enzyme proteins or select peptide candidates that improve enzyme activity, which may be of interest for the prediction of new drugs or drug targets. To illustrate the model's application, we report the 2D-Electrophoresis (2DE) isolation from Leishmania infantum as well as MADLI TOF Mass Spectra characterization and theoretical study of the Peptide Mass Fingerprints (PMFs) of a new protein sequence. The theoretical study focused on MASCOT, BLAST alignment, and alignment-free QSAR prediction of the contribution of 29 peptides found in the PMF of the new protein to specific enzyme action. This combined strategy may be used to identify and predict peptides of prokaryote and eukaryote parasites and their hosts as well as other superior organisms, which may be of interest in drug development or target identification.

[1]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[2]  J. Gálvez,et al.  Molecular search of new active drugs against Toxoplasma gondii. , 1999, SAR and QSAR in environmental research.

[3]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[4]  D. Oesterhelt,et al.  Analysis of the cytosolic proteome of Halobacterium salinarum and its implication for genome annotation , 2005, Proteomics.

[5]  Rafael Gozalbes,et al.  Anti-Toxoplasma Activities of 24 Quinolones and Fluoroquinolones In Vitro: Prediction of Activity by Molecular Topology and Virtual Computational Techniques , 2000, Antimicrobial Agents and Chemotherapy.

[6]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[7]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[8]  Lourdes Santana,et al.  A QSAR model for in silico screening of MAO-A inhibitors. Prediction, synthesis, and biological assay of novel coumarins. , 2006, Journal of medicinal chemistry.

[9]  David Baker,et al.  Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major. , 2006, Acta crystallographica. Section F, Structural biology and crystallization communications.

[10]  J. Lolkema,et al.  Membrane topology prediction by hydropathy profile alignment: membrane topology of the Na(+)-glutamate transporter GltS. , 2007, Biochemistry.

[11]  Jiangning Song,et al.  Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information , 2006, BMC Bioinformatics.

[12]  K. Chou,et al.  Predicting protein quaternary structure by pseudo amino acid composition , 2003, Proteins.

[13]  R. García-Domenech,et al.  Some new trends in chemical graph theory. , 2008, Chemical reviews.

[14]  Kuo-Chen Chou,et al.  Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[15]  Ramón García-Domenech,et al.  New agents active against Mycobacterium avium complex selected by molecular topology: a virtual screening method. , 2003, The Journal of antimicrobial chemotherapy.

[16]  Barry Moore,et al.  Genome-based peptide fingerprint scanning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Vladimir A. Ivanisenko,et al.  PDBSite: a database of the 3D structure of protein functional sites , 2004, Nucleic Acids Res..

[18]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[19]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[20]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[21]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[22]  Alan Talevi,et al.  Application of linear discriminant analysis in the virtual screening of antichagasic drugs through trypanothione reductase inhibition , 2006, Molecular Diversity.

[23]  K. Resing,et al.  Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. , 2004, Analytical chemistry.

[24]  Y. Z. Chen,et al.  Prediction of transporter family from protein sequence by support vector machine approach , 2005, Proteins.

[25]  M. Vassura,et al.  Reconstruction of 3D Structures From Protein Contact Maps , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Humberto González-Díaz,et al.  2D RNA-QSAR: assigning ACC oxidase family membership with stochastic molecular descriptors; isolation and prediction of a sequence from Psidium guajava L. , 2005, Bioorganic & medicinal chemistry letters.

[27]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[28]  Andreas Zell,et al.  Ranking Methods for the Prediction of Frequent Top Scoring Peptides from Proteomics Data , 2009 .

[29]  Yoshihiro Yamanishi,et al.  E-zyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs , 2009, Bioinform..

[30]  Humberto González-Díaz,et al.  Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. , 2006, FEBS letters.

[31]  Stephen H. Bryant,et al.  CD-Search: protein domain annotations on the fly , 2004, Nucleic Acids Res..

[32]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[33]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[34]  Kuo-Chen Chou,et al.  Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. , 2005, Journal of theoretical biology.

[35]  Masato Ishikawa,et al.  MASCOT: multiple alignment system for protein sequences based on three- way dynamic programming , 1993, Comput. Appl. Biosci..

[36]  Kuo-Chen Chou,et al.  Using functional domain composition to predict enzyme family classes. , 2005, Journal of proteome research.

[37]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[38]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[39]  Y. Z. Chen,et al.  Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach , 2004, Nucleic acids research.

[40]  Mikhail M Savitski,et al.  New Data Base-independent, Sequence Tag-based Scoring of Peptide MS/MS Data Validates Mowse Scores, Recovers Below Threshold Data, Singles Out Modified Peptides, and Assesses the Quality of MS/MS Techniques* , 2005, Molecular & Cellular Proteomics.

[41]  K. Chou,et al.  Comparative Study of Topological Indices of Macro/Supramolecular RNA Complex Networks. , 2009 .

[42]  G. Crippen Prediction of protein folding from amino acid sequence over discrete conformation spaces. , 1991, Biochemistry.

[43]  M. Froimowitz,et al.  HyperChem: a software package for computational chemistry and molecular modeling. , 1993, BioTechniques.

[44]  Y.Z. Chen,et al.  Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity , 2004, Virology.

[45]  F. Leenders,et al.  An iterative calibration method with prediction of post‐translational modifications for the construction of a two‐dimensional electrophoresis database of mouse mammary gland proteins , 2002, Proteomics.

[46]  Zhentian Lei,et al.  A Two-dimensional Electrophoresis Proteomic Reference Map and Systematic Identification of 1367 Proteins from a Cell Suspension Culture of the Model Legume Medicago truncatula*S , 2005, Molecular & Cellular Proteomics.

[47]  Y. Z. Chen,et al.  Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity Published, JLR Papers in Press, January 27, 2006. , 2006, Journal of Lipid Research.

[48]  Kuo-Chen Chou,et al.  HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. , 2008, Bioorganic & medicinal chemistry.

[49]  D. Beveridge,et al.  Exploratory studies of ab initio protein structure prediction: Multiple copy simulated annealing, AMBER energy functions, and a generalized born/solvent accessibility solvation model , 2002, Proteins.

[50]  Francisco Torrens,et al.  Dragon method for finding novel tyrosinase inhibitors: Biosilico identification and experimental in vitro assays. , 2007, European journal of medicinal chemistry.

[51]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[52]  An-Suei Yang,et al.  Structure-dependent sequence alignment for remotely related proteins , 2002, Bioinform..

[53]  F. Prado-Prado,et al.  Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. , 2008, Current topics in medicinal chemistry.

[54]  E. Fedorov,et al.  Bioinformatics and Molecular Modeling in Chemical Enzymology. Active Sites of Hydrolases , 2002, Biochemistry (Moscow).

[55]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[56]  Ute Baumann,et al.  BMC Bioinformatics BioMed Central Methodology article Automated methods of predicting the function of biological sequences using GO and BLAST , 2005 .

[57]  A. Talevi,et al.  Discovery of anticonvulsant activity of abietic acid through application of linear discriminant analysis. , 2007, Bioorganic & medicinal chemistry letters.

[58]  F Gharahdaghi,et al.  Mass spectrometric identification of proteins from silver‐stained polyacrylamide gel: A method for the removal of silver ions to enhance sensitivity , 1999, Electrophoresis.

[59]  Carolina L. Bellera,et al.  A successful virtual screening application: prediction of anticonvulsant activity in MES test of widely used pharmaceutical and food preservatives methylparaben and propylparaben , 2007, J. Comput. Aided Mol. Des..

[60]  A. Bhuyan,et al.  Prediction of folding rates of small proteins: empirical relations based on length, secondary structure content, residue type, and stability. , 2006, Biochemistry.

[61]  M. Savitski,et al.  Proteomics-grade de novo sequencing approach. , 2005, Journal of proteome research.

[62]  Ricardo del Corazón Grau-Ábalo,et al.  Bond-based linear indices in QSAR: computational discovery of novel anti-trichomonal compounds , 2008, J. Comput. Aided Mol. Des..

[63]  M J Sternberg,et al.  Application of machine learning to structural molecular biology. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[64]  Yu-Dong Cai,et al.  Prediction of protein function in the absence of significant sequence similarity. , 2004, Current medicinal chemistry.

[65]  Humberto González Díaz,et al.  Computational chemistry study of 3D‐structure‐function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials , 2009, J. Comput. Chem..

[66]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[67]  Francisco Torrens,et al.  TOMOCOMD-CARDD descriptors-based virtual screening of tyrosinase inhibitors: evaluation of different classification model combinations using bond-based linear indices. , 2007, Bioorganic & medicinal chemistry.

[68]  J. Gálvez,et al.  Topological virtual screening: a way to find new anticonvulsant drugs from chemical diversity. , 2003, Bioorganic & medicinal chemistry letters.

[69]  Juan Cui,et al.  Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity , 2006, Proteomics.

[70]  Kuo-Chen Chou,et al.  Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[71]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[72]  Kuo-Chen Chou,et al.  Predicting enzyme family class in a hybridization space , 2004, Protein science : a publication of the Protein Society.

[73]  P. Babbitt Definitions of enzyme function for the structural genomics era. , 2003, Current opinion in chemical biology.

[74]  Yoanna María Alvarez-Ginarte,et al.  Applying pattern recognition methods plus quantum and physico‐chemical molecular descriptors to analyze the anabolic activity of structurally diverse steroids , 2008, J. Comput. Chem..

[75]  Grace Patlewicz,et al.  Current topics on software use in medicinal chemistry: intellectual property, taxes, and regulatory issues. , 2008, Current topics in medicinal chemistry.