Novel protein weight matrix generated from amino acid indices

In recent years, numerous protein weight matrices have been developed that include physical characteristics of proteins, such as local sequence-structure information, alpha-helix information, secondary structure information and solvent accessibility states. These protein weight matrices are shown to have generally improved protein sequence alignments over classical protein weight matrices, like Point Accepted Mutation (PAM), Blocks of Amino Acid Substitution (BLOSUM), and GONNET matrices, where important limitations have been observe in recent works. In this paper, a novel protein weight matrix is constructed and presented. This protein weight matrix is not considered based on the mutation rate, like PAM or BLOSUM matrices, but on the physicochemical properties of each amino acid. In the literature, over 500 amino acid indices exist, each one representing a unique biological protein feature. For this study, 25 amino acid indices were selected. These amino acid indices represent general and widely accepted features of the amino acids. By using the proposed protein weight matrix the following advantages can be obtained compared to the classical protein weight matrices. The proposed protein weight matrix is not biased to specific groups of protein sequences as the values are calculated from the amino acid indices, and not from the protein sequences. Additionally, for the proposed protein weight matrix, the same matrix can be considered regardless of the protein sequence's homology to be aligned or the mutation rate presented. A correlation to the physical characterisations of the amino acids that the protein weight matrix derived from can be achieved. Different similarity matrices can be generated when different physical characterisations of amino acids are considered.

[1]  Minoru Kanehisa,et al.  New amino acid indices based on residue network topology. , 2007, Genome informatics. International Conference on Genome Informatics.

[2]  Shmuel Pietrokovski,et al.  New features of the Blocks Database servers , 1999, Nucleic Acids Res..

[3]  M. Oobatake,et al.  An analysis of non-bonded energy of proteins. , 1977, Journal of theoretical biology.

[4]  Albert Y. Zomaya Handbook of Nature-Inspired and Innovative Computing - Integrating Classical Models with Emerging Technologies , 2006 .

[5]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[6]  P M Cullis,et al.  Affinities of amino acid side chains for solvent water. , 1981, Biochemistry.

[7]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[9]  Julio Caballero,et al.  Amino acid sequence autocorrelation vectors and bayesian‐regularized genetic neural networks for modeling protein conformational stability: Gene V protein mutants , 2007, Proteins.

[10]  S Henikoff,et al.  Performance evaluation of amino acid substitution matrices , 1993, Proteins.

[11]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[12]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[13]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[14]  I. Cosic,et al.  Is it Possible to Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing? , 1985, IEEE Transactions on Biomedical Engineering.

[15]  P. Lio’,et al.  Models of molecular evolution and phylogeny. , 1998, Genome research.

[16]  Xuhua Xia,et al.  What Amino Acid Properties Affect Protein Evolution? , 1998, Journal of Molecular Evolution.

[17]  John O. Hutchens Heat Capacities, Absolute Entropies, and Entropies of Formation of Amino Acids and Related Compounds , 2010 .

[18]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[19]  D. Brock,et al.  The biochemical genetics of man , 1978 .

[20]  P. Ponnuswamy,et al.  Hydrophobic character of amino acid residues in globular proteins , 1978, Nature.

[21]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[22]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[23]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[24]  Hongyi Zhou,et al.  Quantifying the effect of burial of amino acid residues on protein stability , 2003, Proteins.

[25]  R. Wolfenden,et al.  Water, protein folding, and the genetic code. , 1979, Science.

[26]  Christopher Bystroff,et al.  Improved pairwise alignment of proteins in the Twilight Zone using local structure predictions , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[27]  G. Fasman,et al.  Practical Handbook of Biochemistry and Molecular Biology , 1989 .

[28]  A. Lundgren,et al.  Chemistry of Amino Acids and Proteins , 1949 .

[29]  Roger L. Lundblad,et al.  Amino Acid Antagonists , 2010, Handbook of Biochemistry.

[30]  P. Ponnuswamy,et al.  Positional flexibilities of amino acid residues in globular proteins , 2009 .

[31]  P. Argos,et al.  Structural prediction of membrane-bound proteins. , 2005, European journal of biochemistry.

[32]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[33]  Graziano Pesole,et al.  Correlated substitution analysis and the prediction of amino acid structural contacts , 2007, Briefings Bioinform..

[34]  Stephen F. Altschul,et al.  The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions , 2005, Bioinform..