Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins.

An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biochemical properties of amino acids. As a follow-up to the previous study, we have increased the size of the database, which currently contains 402 published indices, and re-performed the single-linkage cluster analysis. The results basically confirmed the previous findings. Another important feature of amino acids that can be represented numerically is the similarity between them. Thus, a similarity matrix, also called a mutation matrix, is a set of 20 x 20 numerical values used for protein sequence alignments and similarity searches. We have collected 42 published matrices, performed hierarchical cluster analyses and identified several clusters corresponding to the nature of the data set and the method used for constructing the mutation matrix. Further, we have tried to reproduce each mutation matrix by the combination of amino acid indices in order to understand which properties of amino acids are reflected most. There was a relationship between the PAM units of Dayhoff's mutation matrix and the volume and hydrophobicity of amino acids. The database of 402 amino acid indices and 42 amino acid mutation matrices is made publicly available on the Internet.

[1]  C. Tanford Macromolecules , 1994, Nature.

[2]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[3]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[4]  A. Lesk COMPUTATIONAL MOLECULAR BIOLOGY , 1988, Proceeding of Data For Discovery.

[5]  Petra Mutzel,et al.  Computational Molecular Biology , 1996 .