A new method for analyzing protein sequence relationships based on Sammon maps

Recent advances in gene sequencing and rational drug design have re‐emphasized the need for new methods for protein analysis, classification, and structure and function prediction. In this article, we introduce a new method for analyzing protein sequences based on Sammon's non‐linear mapping algorithm. When applied to a family of homologous sequences, the method is able to capture the essential features of the similarity matrix, and provides a faithful representation of chemical or evolutionary distance in a simple and intuitive way. The merits of the new algorithm are demonstrated using examples from the protein kinase family.

[1]  Barry Robson,et al.  An algorithm for secondary structure determination in proteins based on sequence similarity , 1986, FEBS letters.

[2]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[3]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Risler,et al.  Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. , 1988, Journal of molecular biology.

[5]  John P. Overington,et al.  A structural basis for sequence comparisons. An evaluation of scoring methodologies. , 1993, Journal of molecular biology.

[6]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[7]  W. Fitch An improved method of testing for evolutionary homology. , 1966, Journal of molecular biology.

[8]  R F Doolittle,et al.  Similar amino acid sequences revisited. , 1989, Trends in biochemical sciences.

[9]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[10]  M Levitt,et al.  From structure to sequence and back again. , 1996, Journal of molecular biology.

[11]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[12]  T. Hunter A thousand and one protein kinases , 1987, Cell.

[13]  W G Richards,et al.  A novel representation of protein structure. , 1995, Journal of molecular graphics.

[14]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[15]  A. Mclachlan Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . , 1971, Journal of molecular biology.

[16]  M Levitt,et al.  Molecular dynamics of native protein. II. Analysis and nature of motion. , 1983, Journal of molecular biology.

[17]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[18]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[19]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[20]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[21]  T. Hunter,et al.  The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification 1 , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.