Construction of protein dendrograms based on amino acid indices and Discrete Fourier Transform

From the literature, existing methods use pairwise percent identity to identify the percentage of similarity between two protein sequences, in order to create a dendrogram. As this is a parametric method of measuring the similarities between proteins, and different parameter may yield different results, this method does not guarantee that the global optimal similarity values will be found. As protein dendrogram construction is used in other areas, such as multiple protein sequence alignments, it is very important that the most related protein sequences to be identified and align first. Furthermore, by using the pairwise percent identity of the protein sequences to construct the dendrograms, the physical characteristics of protein sequences and amino acids are not considered. In this paper, a new method was proposed for constructing protein sequence dendrograms. For this method, Discrete Fourier Transform, was used to construct the distance matrix in combination with the multiple amino acid indices that were used to encode protein sequences into numerical sequences. In order to show the applicability and robustness of the proposed method, a case study was presented by using nine Cluster of Differentiation 4 protein sequences extracted from the UniProt online database.

[1]  R. Wolfenden,et al.  Water, protein folding, and the genetic code. , 1979, Science.

[2]  G. Fasman,et al.  Practical Handbook of Biochemistry and Molecular Biology , 1989 .

[3]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[4]  P. Ponnuswamy,et al.  Positional flexibilities of amino acid residues in globular proteins , 2009 .

[5]  M. Oobatake,et al.  An analysis of non-bonded energy of proteins. , 1977, Journal of theoretical biology.

[6]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[7]  Xuhua Xia,et al.  What Amino Acid Properties Affect Protein Evolution? , 1998, Journal of Molecular Evolution.

[8]  K. Chou,et al.  2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. , 2010, Journal of theoretical biology.

[9]  P. Argos,et al.  Structural prediction of membrane-bound proteins. , 2005, European journal of biochemistry.

[10]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[11]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[12]  Hongyi Zhou,et al.  Quantifying the effect of burial of amino acid residues on protein stability , 2003, Proteins.

[13]  J. Katzmann Leucocyte Typing: Human Leucocyte Differentiation Antigens Detected by Monoclonal Antibodies; Specification, Classification, Nomenclature , 1985 .

[14]  Minoru Kanehisa,et al.  New amino acid indices based on residue network topology. , 2007, Genome informatics. International Conference on Genome Informatics.

[15]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[16]  P M Cullis,et al.  Affinities of amino acid side chains for solvent water. , 1981, Biochemistry.

[17]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[18]  P. Ponnuswamy,et al.  Hydrophobic character of amino acid residues in globular proteins , 1978, Nature.

[19]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[20]  I. Cosic,et al.  Is it Possible to Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing? , 1985, IEEE Transactions on Biomedical Engineering.

[21]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[22]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[23]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[24]  Huseyin Seker,et al.  Construction of protein distance matrix based on amino acid indices and Discrete Fourier Transform , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[25]  Julio Caballero,et al.  Amino acid sequence autocorrelation vectors and bayesian‐regularized genetic neural networks for modeling protein conformational stability: Gene V protein mutants , 2007, Proteins.

[26]  Keith A. Crandall,et al.  TreeSAAP: Selection on Amino Acid Properties using phylogenetic trees , 2003, Bioinform..

[27]  Ernst Althaus,et al.  Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics , 2002, ECCB.

[28]  A. Lundgren,et al.  Chemistry of Amino Acids and Proteins , 1949 .

[29]  Roger L. Lundblad,et al.  Amino Acid Antagonists , 2010, Handbook of Biochemistry.

[30]  John O. Hutchens Heat Capacities, Absolute Entropies, and Entropies of Formation of Amino Acids and Related Compounds , 2010 .

[31]  D. Brock,et al.  The biochemical genetics of man , 1978 .