Comparative analysis of protein primary sequences with graph energy

We propose in this paper, the graph energy and Laplacian energy of 20 amino acids based on the codons coding the amino acids and apply them to put forward a novel 2-D graphical representation of proteins. The novel graphical representation has no circuit or degeneracy, uniquely represents proteins and allows one to easily and quickly visually observe and inspect similarity/dissimilarity between them. It also leads to two novel protein descriptors, the graph energy of a protein sequence, and the increment of graph energy between two protein sequences. We develop similarities/dissimilarities model and successfully analyze the similarities/dissimilarities of ND5, 36PDs, 24TFs and 27AFPs with good results consistent with ClustalW even better ones.

[1]  Michael J. Kuiper,et al.  β-Helix structure and ice-binding properties of a hyperactive antifreeze protein from an insect , 2000, Nature.

[2]  Xinguo Lu,et al.  A novel graphical representation of protein sequences and its application , 2011, J. Comput. Chem..

[3]  Yusen Zhang,et al.  A novel method for similarity/dissimilarity analysis of protein sequences , 2013 .

[4]  Timothy Clark,et al.  2D-dynamic representation of DNA sequences , 2007 .

[5]  B. Sykes,et al.  Cold survival in freeze‐intolerant insects , 2004 .

[6]  K. Chou,et al.  2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. , 2010, Journal of theoretical biology.

[7]  R. Balakrishnan The energy of a graph , 2004 .

[8]  J. Duman,et al.  Site-specific forms of antifreeze protein in the beetle Dendroides canadensis , 2002, Journal of Comparative Physiology B.

[9]  Tianming Wang,et al.  Phylogenetic Analysis of Protein Sequences Based on Distribution of Length About Common Substring , 2011, The protein journal.

[10]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[11]  C. Casieri,et al.  Detection of magnetic environments in porous media by low-field 2D NMR relaxometry , 2010 .

[12]  Kinkar Chandra Das,et al.  On Laplacian energy of graphs , 2014, Discrete Mathematics.

[13]  Feng-Hsu Lin,et al.  Structural modeling of snow flea antifreeze protein. , 2007, Biophysical journal.

[14]  M. Ford,et al.  Molecular evolution of transferrin: evidence for positive selection in salmonids. , 2001, Molecular biology and evolution.

[15]  Yi Zhang,et al.  A new model of amino acids evolution, evolution index of amino acids and its application in graphical representation of protein sequences , 2010 .

[16]  Lianping Yang,et al.  Use of information discrepancy measure to compare protein secondary structures , 2009 .

[17]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[18]  Alexandru T Balaban,et al.  Graphical representation of proteins. , 2011, Chemical reviews.

[19]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[20]  Ping-an He,et al.  A Novel Descriptor for Protein Similarity Analysis , 2011 .

[21]  Qi Dai,et al.  Comparison study on k-word statistical measures for protein: From sequence to 'sequence space' , 2008, BMC Bioinformatics.

[22]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[23]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[24]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[25]  Yang Yang,et al.  Classification of Protein Sequences Based on Word Segmentation Methods , 2008, APBC.

[26]  Luonan Chen,et al.  Evaluating Protein Similarity from Coarse Structures , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[28]  Peter L Davies,et al.  Structure and function of antifreeze proteins. , 2002, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[29]  Chun Li,et al.  Analysis of similarity/dissimilarity of protein sequences , 2008, Proteins.

[30]  I. Gutman,et al.  Laplacian energy of a graph , 2006 .

[31]  Ping-an He,et al.  A novel descriptor of protein sequences and its application. , 2014, Journal of theoretical biology.

[32]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.