Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application

In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix. It contains evolutionary information and provides intuitive visualization. To further analyze the similarity of proteins, we extract a specific vector from the graphical representation curve. The vector is used to calculate the similarity distance between 2 protein sequences. To prove the effectiveness of our approach, we apply it to 3 real data sets. The results are consistent with the known evolution fact and show that our method is effective in phylogenetic analysis.

[1]  Chun Li,et al.  Analysis of similarity/dissimilarity of protein sequences , 2008, Proteins.

[2]  Laurianne McLaughlin Automated programming the next wave of developer power tools , 2006, IEEE Software.

[3]  Dorota Bielinska-Waz,et al.  Spectral-dynamic representation of DNA sequences , 2017, J. Biomed. Informatics.

[4]  Timothy Clark,et al.  2D-dynamic representation of DNA sequences , 2007 .

[5]  C. Woese,et al.  On the fundamental nature and evolution of the genetic code. , 1966, Cold Spring Harbor symposia on quantitative biology.

[6]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[7]  C. Alff-Steinberger,et al.  The genetic code and error transmission. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Dejan Plavšić,et al.  Milestones in graphical bioinformatics , 2013 .

[9]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[10]  Jun Feng,et al.  A protein mapping method based on physicochemical properties and dimension reduction , 2015, Comput. Biol. Medicine.

[11]  Zhao-Hui Qi,et al.  Evolution trends of the 2009 pandemic influenza A (H1N1) viruses in different continents from March 2009 to April 2012 , 2014, Biologia.

[12]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[13]  Lei Wang,et al.  ADLD: A Novel Graphical Representation of Protein Sequences and Its Application , 2014, Comput. Math. Methods Medicine.

[14]  Koichiro Tamura,et al.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. , 2013, Molecular biology and evolution.

[15]  Tomaz Pisanski,et al.  Graphical representation of proteins as four-color maps and their numerical characterization. , 2009, Journal of molecular graphics & modelling.

[16]  Yanping Zhang,et al.  The graphical representation of protein sequences based on the physicochemical properties and its applications , 2010, J. Comput. Chem..

[17]  Piotr Wąż,et al.  3D-dynamic representation of DNA sequences , 2014, Journal of Molecular Modeling.

[18]  Renfa Li,et al.  A group of 3D graphical representation of DNA sequences based on dual nucleotides , 2008 .

[20]  P. Sneath Relations between chemical structure and biological activity in peptides. , 1966, Journal of theoretical biology.

[21]  Dejan Plavšić,et al.  Novel spectral representation of RNA secondary structure without loss of information , 2009 .

[22]  A. El-Lakkani,et al.  An efficient numerical method for protein sequences similarity analysis based on a new two-dimensional graphical representation , 2015, SAR and QSAR in environmental research.

[23]  Jure Zupan,et al.  Novel 2-D graphical representation of proteins , 2006 .

[24]  Milan Randić,et al.  2-D Graphical representation of proteins based on physico-chemical properties of amino acids , 2007 .

[25]  Chenglong Yu,et al.  Protein map: an alignment-free sequence comparison method based on various properties of amino acids. , 2011, Gene.

[26]  Dorota Bielińska-Wa̧ż Four-component spectral representation of DNA sequences , 2009 .

[27]  D. Bielinska-Waz Graphical and numerical representations of DNA sequences: statistical aspects of similarity , 2011, Journal of mathematical chemistry.

[28]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Fang Li,et al.  Structure, Function, and Evolution of Coronavirus Spike Proteins. , 2016, Annual review of virology.