A new criterion and method for amino acid classification.

It is accepted that many evolutionary changes of amino acid sequence in proteins are conservative: the replacement of one amino acid by another residue has a far greater chance of being accepted if the two residues have similar properties. It is difficult, however, to identify relevant physicochemical properties that capture this similarity. In this paper we introduce a criterion that determines similarity from an evolutionary point of view. Our criterion is based on the description of protein evolution by a Markov process and the corresponding matrix of instantaneous replacement rates. It is inspired by the conductance, a quantity that reflects the strength of mixing in a Markov process. Furthermore we introduce a method to divide the 20 amino acid residues into subsets that achieve good scores with our criterion. The criterion has the time-invariance property that different time distances of the same amino acid replacement rate matrix lead to the same grouping; but different rate matrices lead to different groupings. Therefore it can be used as an automated method to compare matrices derived from consideration of different types of proteins, or from parts of proteins sharing different structural or functional features. We present the groupings resulting from two standard matrices used in sequence alignment and phylogenetic tree estimation.

[1]  Xuhua Xia,et al.  What Amino Acid Properties Affect Protein Evolution? , 1998, Journal of Molecular Evolution.

[2]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[3]  Z. Yang,et al.  Models of amino acid substitution and applications to mitochondrial protein evolution. , 1998, Molecular biology and evolution.

[4]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[5]  W R Taylor,et al.  Residual colours: a proposal for aminochromography. , 1997, Protein engineering.

[6]  David C. Jones,et al.  A mutation data matrix for transmembrane proteins , 1994, FEBS letters.

[7]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[8]  Dónall A. Mac Dónaill,et al.  Representation of amino acids as five-bit or three-bit patterns for filtering protein databases , 2001, Bioinform..

[9]  John P. Overington,et al.  Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction , 1990, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[10]  Jun Wang,et al.  A computational approach to simplifying the protein folding alphabet , 1999, Nature Structural Biology.

[11]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[12]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[13]  P. Deuflharda,et al.  Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains , 2000 .

[14]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[15]  P. Lio’,et al.  Models of molecular evolution and phylogeny. , 1998, Genome research.

[16]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[17]  Barry Robson,et al.  What is a conservative substitution? , 1983, Journal of Molecular Evolution.

[18]  P. Waddell,et al.  Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA , 2000, Journal of Molecular Evolution.

[19]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[20]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[21]  A. Sinclair Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[22]  Stefano Toppo,et al.  Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices , 2002, Bioinform..

[23]  David C. Jones,et al.  Assessing the impact of secondary structure and solvent accessibility on protein evolution. , 1998, Genetics.