Amino acid sequence autocorrelation vectors and bayesian‐regularized genetic neural networks for modeling protein conformational stability: Gene V protein mutants

Development of novel computational approaches for modeling protein properties from their primary structure is the main goal in applied proteomics. In this work, we reported the extension of the autocorrelation vector formalism to amino acid sequences for encoding protein structural information with modeling purposes. Amino acid sequence autocorrelation (AASA) vectors were calculated by measuring the autocorrelations at sequence lags ranging from 1 to 15 on the protein primary structure of 48 amino acid/residue properties selected from the AAindex data base. A total of 720 AASA descriptors were tested for building predictive models of the change of thermal unfolding Gibbs free energy change (ΔΔG) of gene V protein upon mutation. In this sense, ensembles of Bayesian‐regularized genetic neural networks (BRGNNs) were used for obtaining an optimum nonlinear model for the conformational stability. The ensemble predictor described about 88% and 66% variance of the data in training and test sets respectively. Furthermore, the optimum AASA vector subset not only helped to successfully model unfolding stability but also well distributed wild‐type and gene V protein mutants on a stability self‐organized map (SOM), when used for unsupervised training of competitive neurons. Proteins 2007. © 2007 Wiley‐Liss, Inc.

[1]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[2]  Jouko Lampinen,et al.  Bayesian approach for neural networks--review and case studies , 2001, Neural Networks.

[3]  K. Yutani,et al.  Contribution of hydrogen bonds to the conformational stability of human lysozyme: calorimetry and X-ray analysis of six Ser --> Ala mutants. , 1999, Biochemistry.

[4]  Y. Yamagata,et al.  Contribution of intra- and intermolecular hydrogen bonds to the conformational stability of human lysozyme(,). , 1999, Biochemistry.

[5]  L Serrano,et al.  Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: comparison with Zimm-Bragg and Lifson-Roig formalisms. , 1997, Biopolymers.

[6]  Y. Yamagata,et al.  Contribution of hydrogen bonds to the conformational stability of human lysozyme: calorimetry and X-ray analysis of six tyrosine --> phenylalanine mutants. , 1998, Biochemistry.

[7]  Johann Gasteiger,et al.  Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods , 2004, J. Comput. Aided Mol. Des..

[8]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[9]  M. Oobatake,et al.  Contribution of hydrophobic residues to the stability of human lysozyme: calorimetric studies and X-ray structural analysis of the five isoleucine to valine mutants. , 1996, Journal of molecular biology.

[10]  Maykel Pérez González,et al.  Quantitative structure-activity relationship to predict differential inhibition of aldose reductase by flavonoid compounds. , 2005, Bioorganic & medicinal chemistry.

[11]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[12]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[13]  J. Zupan,et al.  Neural networks: A new method for solving chemical problems or just a passing phase? , 1991 .

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Julio Caballero,et al.  Modeling of activity of cyclic urea HIV-1 protease inhibitors using regularized-artificial neural networks. , 2006, Bioorganic & medicinal chemistry.

[16]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[17]  K Nishikawa,et al.  Experimental verification of the 'stability profile of mutant protein' (SPMP) data using mutant human lysozymes. , 1999, Protein engineering.

[18]  Y. Yamagata,et al.  Contribution of water molecules in the interior of a protein to the conformational stability. , 1998, Journal of molecular biology.

[19]  H Ichikawa,et al.  Neural networks applied to structure-activity relationships. , 1990, Journal of medicinal chemistry.

[20]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[21]  Peter A. Kollman,et al.  Free energy calculations on protein stability: Thr-157 .fwdarw. Val-157 mutation of T4 lysozyme , 1989 .

[22]  Y. Yamagata,et al.  Contribution of salt bridges near the surface of a protein to the conformational stability. , 2000, Biochemistry.

[23]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[24]  R. Abagyan,et al.  Large‐scale prediction of protein geometry and stability changes for arbitrary single point mutations , 2004, Proteins.

[25]  Y. Yamagata,et al.  Contribution of amino acid substitutions at two different interior positions to the conformational stability of human lysozyme. , 1999, Protein engineering.

[26]  Y. Yamagata,et al.  Role of surface hydrophobic residues in the conformational stability of human lysozyme at three different positions. , 2000, Biochemistry.

[27]  Rajarshi Guha,et al.  Development of Linear, Ensemble, and Nonlinear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors , 2004, J. Chem. Inf. Model..

[28]  Walter Cedeño,et al.  On the Use of Neural Network Ensembles in QSAR and QSPR , 2002, J. Chem. Inf. Comput. Sci..

[29]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[30]  Bahram Hemmateenejad,et al.  Toward an Optimal Procedure for PC-ANN Model Building: Prediction of the Carcinogenic Activity of a Large Set of Drugs , 2005, J. Chem. Inf. Model..

[31]  Lee Testing homology modeling on mutant proteins: predicting structural and thermodynamic effects in the Ala98-->Val mutants of T4 lysozyme. , 1995, Folding & design.

[32]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[33]  Dage Liu,et al.  Atomic Force Microscopy Analysis of Intermediates in Cobalt Hexammine-Induced DNA Condensation , 2000, Journal of biomolecular structure & dynamics.

[34]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[35]  M. Gromiha,et al.  Importance of Surrounding Residues for Protein Stability of Partially Buried Mutations , 2000, Journal of biomolecular structure & dynamics.

[36]  J. Gasteiger,et al.  Autocorrelation of Molecular Surface Properties for Modeling Corticosteroid Binding Globulin and Cytosolic Ah Receptor Activity by Neural Networks , 1995 .

[37]  Frank R. Burden,et al.  Bayesian neural nets for modeling in drug discovery , 2004 .

[38]  Stephen L Mayo,et al.  Prudent modeling of core polar residues in computational protein design. , 2003, Journal of molecular biology.

[39]  M. Gromiha,et al.  Relationship Between Amino Acid Properties and Protein Stability: Buried Mutations , 1999, Journal of protein chemistry.

[40]  Piero Fariselli,et al.  Predicting protein stability changes from sequences using support vector machines , 2005, ECCB/JBI.

[41]  Maykel Pérez González,et al.  Modeling of farnesyltransferase inhibition by some thiol and non-thiol peptidomimetic inhibitors using genetic neural networks and RDF approaches. , 2006, Bioorganic & medicinal chemistry.

[42]  Raphael Guerois,et al.  Energy estimation in protein design. , 2002, Current opinion in structural biology.

[43]  Knut Baumann,et al.  Chance Correlation in Variable Subset Regression: Influence of the Objective Function, the Selection Mechanism, and Ensemble Averaging , 2005 .

[44]  Y. Yamagata,et al.  Contribution of the hydrophobic effect to the stability of human lysozyme: calorimetric studies and X-ray structural analyses of the nine valine to alanine mutants. , 1997, Biochemistry.

[45]  Julio Caballero,et al.  Linear and nonlinear modeling of antifungal activity of some heterocyclic ring derivatives using multiple linear regression and Bayesian-regularized neural networks , 2006, Journal of molecular modeling.

[46]  Rajarshi Guha,et al.  Interpreting Computational Neural Network Quantitative Structure-Activity Relationship Models: A Detailed Interpretation of the Weights and Biases , 2005, J. Chem. Inf. Model..

[47]  Eugenio Uriarte,et al.  Markovian Backbone Negentropies: Molecular descriptors for protein research. I. Predicting protein stability in Arc repressor mutants , 2004, Proteins.

[48]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[49]  Y. Yamagata,et al.  Contribution of polar groups in the interior of a protein to the conformational stability. , 2001, Biochemistry.

[50]  Hugh M. Cartwright,et al.  Applications of artificial intelligence in chemistry , 1993 .

[51]  M. Gromiha,et al.  Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations. , 1999, Protein engineering.

[52]  T C Terwilliger,et al.  Engineering multiple properties of a protein by combinatorial mutagenesis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Francisco Torrens,et al.  Protein linear indices of the 'macromolecular pseudograph alpha-carbon atom adjacency matrix' in bioinformatics. Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor. , 2005, Bioorganic & medicinal chemistry.

[54]  Julio Caballero,et al.  Modeling of Cyclin-Dependent Kinase Inhibition by 1H-Pyrazolo[3, 4-d]Pyrimidine Derivatives Using Artificial Neural Network Ensembles , 2005, J. Chem. Inf. Model..

[55]  Akinori Sarai,et al.  ProTherm, version 4.0: thermodynamic database for proteins and mutants , 2004, Nucleic Acids Res..

[56]  S. Levin,et al.  POLINA: detection and evaluation of single amino acid substitutions in protein superfamilies , 1998, Bioinform..

[57]  L Serrano,et al.  Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. , 1998, Journal of molecular biology.

[58]  Martin T. Hagan,et al.  Gauss-Newton approximation to Bayesian learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[59]  David T. Stanton,et al.  On the Physical Interpretation of QSAR Models , 2003, J. Chem. Inf. Comput. Sci..

[60]  C. Frenz,et al.  Neural network‐based prediction of mutation‐induced protein stability changes in Staphylococcal nuclease at 20 residue positions , 2005, Proteins.

[61]  Marianne Rooman,et al.  Prediction of stability changes upon single-site mutations using database-derived potentials , 1999 .

[62]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[63]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[64]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[65]  Andreas Zell,et al.  Locating Biologically Active Compounds in Medium-Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine and Benzodiazepine Agonists , 1996, J. Chem. Inf. Comput. Sci..

[66]  Julio Caballero,et al.  2D Autocorrelation modeling of the activity of trihalobenzocycloheptapyridine analogues as farnesyl protein transferase inhibitors , 2005 .

[67]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[68]  Hongyi Zhou,et al.  Stability scale and atomic solvation parameters extracted from 1023 mutation experiments , 2002, Proteins.

[69]  L. Looger,et al.  Computational design of receptor and sensor proteins with novel functions , 2003, Nature.

[70]  Jeffery G Saven,et al.  Combinatorial protein design. , 2002, Current opinion in structural biology.

[71]  R. Geary,et al.  The Contiguity Ratio and Statistical Mapping , 1954 .

[72]  Y. Yamagata,et al.  A general rule for the relationship between hydrophobic effect and conformational stability of a protein: stability and structure of a series of hydrophobic mutants of human lysozyme. , 1998, Journal of molecular biology.

[73]  P. Privalov,et al.  Stability of protein structure and hydrophobic interaction. , 1988, Advances in protein chemistry.

[74]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[75]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[76]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[77]  Johann Gasteiger,et al.  New Description of Molecular Chirality and Its Application to the Prediction of the Preferred Enantiomer in Stereoselective Reactions , 2001, J. Chem. Inf. Comput. Sci..

[78]  T. Terwilliger,et al.  Energetics of repacking a protein interior. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.