Computational chemistry study of 3D‐structure‐function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials

In a significant work, Dobson and Doig (J Mol Biol 2003, 330, 771) illustrated protein prediction as enzymatic or not from spatial structure without resorting to alignments. They used 52 protein features and a nonlinear support vector machine model to classify more than 1000 proteins collected from the PDB with a 77% overall accuracy. The most useful features were: the secondary‐structure content, the amino acid frequencies, the number of disulphide bonds, and the largest cleft size. Working on the same dataset used by D&D, in this article we reported a good and simple model, based on the Markov chain models (MCM), to classify protein 3D structures as enzymatic or not, taking into consideration the spatial structure without resorting to alignments. Here we define, for the first time, a general MCM to calculate the electrostatic potential, molecular vibrations, van der Waals (vdw) interactions, and hydrophobic interactions (HINT) and use them in comparative studies of potential fields and/or protein function prediction. The dataset is composed of 1371 proteins divided into 689 enzymes and 682 nonenzymes, all proteins were collected from the PDB. The best model we found was a linear model carried out with the linear discriminant analysis; it was able to classify 74.18% of the proteins using only two electrostatic potentials. In the work described here, we define 3D‐HINT potentials (μk) and use them for the first time to derive a classifier for protein enzymes. We analyzed ROC curves, domain of applicability, parametric assumptions, desirability maps, and also tested other nonlinear artificial neural network models which did not improve the linear model. In closing, this MCM allows a fast calculation and comparison of different potentials deriving into accurate protein 3D structure‐function relationships, notably simpler than the previous. © 2008 Wiley Periodicals, Inc. J Comput Chem 2009

[1]  Y.Z. Chen,et al.  Enzyme family classification by support vector machines , 2004, Proteins.

[2]  M Natália D S Cordeiro,et al.  Probing the anticancer activity of nucleoside analogues: a QSAR model approach using an internally consistent training set. , 2007, Journal of medicinal chemistry.

[3]  Julio Caballero,et al.  Linear and nonlinear QSAR study of N-hydroxy-2-[(phenylsulfonyl)amino]acetamide derivatives as matrix metalloproteinase inhibitors. , 2006, Bioorganic & medicinal chemistry.

[4]  Yoanna María Alvarez-Ginarte,et al.  Applying pattern recognition methods plus quantum and physico‐chemical molecular descriptors to analyze the anabolic activity of structurally diverse steroids , 2008, J. Comput. Chem..

[5]  Julio Caballero,et al.  Quantitative structure-activity relationship of rubiscolin analogues as delta opioid peptides using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA). , 2007, Journal of agricultural and food chemistry.

[6]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[7]  Pawel Lewicki,et al.  Statistics : methods and applications : a comprehensive reference for science, industry, and data mining , 2006 .

[8]  C. Camacho,et al.  Modeling side‐chains using molecular dynamics improve recognition of binding region in CAPRI targets , 2005, Proteins.

[9]  Kenneth A Johnson,et al.  The second enzyme in pyrrolnitrin biosynthetic pathway is related to the heme-dependent dioxygenase superfamily. , 2007, Biochemistry.

[10]  Miguel A. Andrade-Navarro,et al.  Evaluation of annotation strategies using an entire genome sequence , 2003, Bioinform..

[11]  H A Scheraga,et al.  Lattice neural network minimization. Application of neural network optimization for locating the global-minimum conformations of proteins. , 1993, Journal of molecular biology.

[12]  Milan Randic,et al.  Orthogonal molecular descriptors , 1991 .

[13]  D. Cozzetto,et al.  Relationship between multiple sequence alignments and quality of protein comparative models , 2004, Proteins.

[14]  Berta Fernández,et al.  Accurate intermolecular ground state potential of the Ne-HCl van der Waals complex. , 2004, The Journal of chemical physics.

[15]  Paola Gramatica,et al.  Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. , 2007, Journal of molecular graphics & modelling.

[16]  Francisco Torrens,et al.  Dragon method for finding novel tyrosinase inhibitors: Biosilico identification and experimental in vitro assays. , 2007, European journal of medicinal chemistry.

[17]  Juan Cui,et al.  Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity , 2006, Proteomics.

[18]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[19]  Julio Caballero,et al.  Amino acid sequence autocorrelation vectors and bayesian‐regularized genetic neural networks for modeling protein conformational stability: Gene V protein mutants , 2007, Proteins.

[20]  M. Xiao,et al.  QSAR study on the Ah receptor-binding affinities of polyhalogenated dibenzo-p-dioxins using net atomic-charge descriptors and a radial basis neural network , 2005, Analytical and bioanalytical chemistry.

[21]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[22]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[23]  Lourdes Santana,et al.  On the applicability of QSAR for recognition of miRNA bioorganic structures at early stages of organism and cell development: embryo and stem cells. , 2007, Bioorganic & medicinal chemistry.

[24]  Humberto González Díaz,et al.  Computational chemistry comparison of stable/nonstable protein mutants classification models based on 3D and topological indices , 2007, J. Comput. Chem..

[25]  Julio Caballero,et al.  Modeling of activity of cyclic urea HIV-1 protease inhibitors using regularized-artificial neural networks. , 2006, Bioorganic & medicinal chemistry.

[26]  S. Vilar,et al.  Probabilistic neural network model for the in silico evaluation of anti-HIV activity and mechanism of action. , 2006, Journal of medicinal chemistry.

[27]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[28]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[29]  Yixue Li,et al.  ECS: An automatic enzyme classifier based on functional domain composition , 2007, Comput. Biol. Chem..

[30]  Vladimir A. Ivanisenko,et al.  PDBSite: a database of the 3D structure of protein functional sites , 2004, Nucleic Acids Res..

[31]  Jurgen Sygusch,et al.  High resolution fast quantitative docking using fourier domain correlation techniques , 1997, Proteins.