Amino acid similarity matrices based on force fields

MOTIVATION We propose a general method for deriving amino acid substitution matrices from low resolution force fields. Unlike current popular methods, the approach does not rely on evolutionary arguments or alignment of sequences or structures. Instead, residues are computationally mutated and their contribution to the total energy/score is collected. The average of these values over each position within a set of proteins results in a substitution matrix. RESULTS Example substitution matrices have been calculated from force fields based on different philosophies and their performance compared with conventional substitution matrices. Although this can produce useful substitution matrices, the methodology highlights the virtues, deficiencies and biases of the source force fields. It also allows a rather direct comparison of sequence alignment methods with the score functions underlying protein sequence to structure threading. AVAILABILITY Example substitution matrices are available from http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html. SUPPLEMENTARY INFORMATION The list of proteins used for data collection and the optimized parameters for the alignment are given as supplementary material at http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html.

[1]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[2]  D Schomburg,et al.  Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding angles. , 1991, Journal of molecular biology.

[3]  Igor N. Berezovsky,et al.  Hierarchy of Regions of Amino Acid Sequence with Respect to Their Role in the Protein Spatial Structure , 2000, J. Comput. Biol..

[4]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[5]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[6]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[7]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[8]  John P. Overington,et al.  Environment‐specific amino acid substitution tables: Tertiary templates and prediction of protein folds , 1992, Protein science : a publication of the Protein Society.

[9]  F E Cohen,et al.  Pairwise sequence alignment below the twilight zone. , 2001, Journal of molecular biology.

[10]  I. Simon,et al.  Predicting isomorphic residue replacements for protein design. , 2009, International journal of peptide and protein research.

[11]  Andrew E. Torda,et al.  Sausage: protein threading with flexible force fields , 1999, Bioinform..

[12]  M. Sippl,et al.  Structure-derived substitution matrices for alignment of distantly related sequences. , 2000, Protein engineering.

[13]  T. Huber,et al.  Protein fold recognition without Boltzmann statistics or explicit physical basis , 1998, Protein science : a publication of the Protein Society.

[14]  Y Wang,et al.  Position-dependent protein mutant profile based on mean force field calculation. , 1996, Protein engineering.

[15]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[16]  Barry Robson,et al.  An algorithm for secondary structure determination in proteins based on sequence similarity , 1986, FEBS letters.

[17]  A C May,et al.  Towards more meaningful hierarchical classification of amino acid scoring matrices. , 1999, Protein engineering.

[18]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[19]  W. Fitch An improved method of testing for evolutionary homology. , 1966, Journal of molecular biology.

[20]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[21]  J. Mohana Rao New scoring matrix for amino acid residue exchanges based on residue characteristic physical parameters. , 1987, International journal of peptide and protein research.

[22]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[23]  J. Risler,et al.  Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. , 1988, Journal of molecular biology.

[24]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[25]  D. G. George,et al.  Mutation data matrix and its uses. , 1990, Methods in enzymology.

[26]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[27]  M Kann,et al.  Optimization of a new score function for the detection of remote homologs , 2000, Proteins.

[28]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[29]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[30]  R. Jernigan,et al.  A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. , 1993, Protein engineering.

[31]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[32]  K. Nishikawa,et al.  Radial locations of amino acid residues in a globular protein: correlation with the sequence. , 1986, Journal of biochemistry.

[33]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[34]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[35]  M J Sippl,et al.  Optimum superimposition of protein structures: ambiguities and implications. , 1996, Folding & design.

[36]  John P. Overington,et al.  A structural basis for sequence comparisons. An evaluation of scoring methodologies. , 1993, Journal of molecular biology.

[37]  A. Mclachlan Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . , 1971, Journal of molecular biology.

[38]  Allan R. Wilks,et al.  The new S language: a programming environment for data analysis and graphics , 1988 .

[39]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[40]  William H. Press,et al.  Numerical recipes in C , 2002 .