Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation

A (two‐dimensional) 2D graphical representation of protein sequences based on six physicochemical properties of amino acids is outlined. The numerical characterization of protein graphs is given as descriptors of protein sequences. It is not only useful for comparative study of proteins but also for encoding innate information about the structure of proteins. The coefficient of determination is proposed as a new similarity/dissimilarity measure. Finally, a simple example is taken to highlight the behavior of the new similarity/dissimilarity measure on protein sequences taken from the ND6 (NADH dehydrogenase subunit 6) proteins for eight different species. The results demonstrate the approach is convenient, fast, and efficient. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009

[1]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[2]  François-Joseph Lapointe,et al.  A weighted least-squares approach for inferring phylogenies from incomplete distance matrices , 2004, Bioinform..

[3]  Tomaz Pisanski,et al.  Graphical representation of proteins as four-color maps and their numerical characterization. , 2009, Journal of molecular graphics & modelling.

[4]  A. Nandy GRAPHICAL ANALYSIS OF DNA SEQUENCE STRUCTURE : III. INDICATIONS OF EVOLUTIONARY DISTINCTIONS AND CHARACTERISTICS OF INTRONS AND EXONS , 1996 .

[5]  Chun Li,et al.  Analysis of similarity/dissimilarity of protein sequences , 2008, Proteins.

[6]  Xiaofeng Guo,et al.  Numerical characterization of DNA sequences in a 2-D graphical representation scheme of low degeneracy , 2003 .

[7]  M. Gates A simple way to look at DNA. , 1986, Journal of theoretical biology.

[8]  Yu-hua Yao,et al.  A class of new 2-D graphical representation of DNA sequences and their application , 2004 .

[9]  Chun Li,et al.  Numerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences. , 2003, Combinatorial chemistry & high throughput screening.

[10]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[11]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[12]  J. Mornon,et al.  Hydrophobic cluster analysis: An efficient new way to compare and analyse amino acid sequences , 1987, FEBS letters.

[13]  Tianming Wang,et al.  On Graphical and Numerical Representation of Protein Sequences , 2006, Journal of biomolecular structure & dynamics.

[14]  M. Randic,et al.  2-D Graphical representation of proteins based on virtual genetic code , 2004, SAR and QSAR in environmental research.

[15]  EUGENE HAMORI,et al.  Novel DNA sequence representations , 1985, Nature.

[16]  Milan Randic,et al.  On 3-D Graphical Representation of Proteomics Maps and Their Numerical Characterization , 2001, J. Chem. Inf. Comput. Sci..

[17]  S. Basak,et al.  Mathematical descriptors of DNA sequences: development and applications , 2006 .

[18]  Bo Liao,et al.  New 2D graphical representation of DNA sequences , 2004, J. Comput. Chem..

[19]  Jure Zupan,et al.  Novel 2-D graphical representation of proteins , 2006 .

[20]  Jun Wang,et al.  New Invariant of DNA Sequences , 2005, J. Chem. Inf. Model..

[21]  Tian-Ming Wang,et al.  Characterization of protein primary sequences based on partial ordering. , 2008, Journal of theoretical biology.

[22]  F. Prado-Prado,et al.  Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. , 2008, Current topics in medicinal chemistry.

[23]  Subhash C. Basak,et al.  Simple Numerical Descriptor for Quantifying Effect of Toxic Substances on DNA Sequences , 2000, J. Chem. Inf. Comput. Sci..

[24]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[25]  N. Goldman,et al.  Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. , 1993, Nucleic acids research.

[26]  Yuhua Yao,et al.  A new 2D graphical representation—Classification curve and the analysis of similarity/dissimilarity of DNA sequences , 2006 .

[27]  B Henrissat,et al.  Hydrophobic cluster analysis: procedures to derive structural and functional information from 2-D-representation of protein sequences. , 1990, Biochimie.

[28]  Jin Xu,et al.  Some Notes on 2-D Graphical Representation of DNA Sequence , 2002, J. Chem. Inf. Comput. Sci..

[29]  Milan Randic,et al.  Distance/Distance Matrixes , 1994, J. Chem. Inf. Comput. Sci..

[30]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[31]  Yu-Hua Yao,et al.  A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them , 2005, J. Comput. Chem..

[32]  Ashesh Nandy,et al.  On the uniqueness of quantitative DNA difference descriptors in 2D graphical representation models , 2003 .

[33]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[34]  A. Nandy,et al.  GRAPHICAL ANALYSIS OF DNA SEQUENCE STRUCTURE. II: RELATIVE ABUNDANCES OF NUCLEOTIDES IN DNAS, GENE EVOLUTION AND DUPLICATION , 1995 .

[35]  Milan Randic,et al.  Algorithm for Coding DNA Sequences into "Spectrum-like" and "Zigzag" Representations , 2005, J. Chem. Inf. Model..

[36]  Guohua Huang,et al.  H–L curve: A novel 2D graphical representation for DNA sequences , 2008 .

[37]  S. Basu,et al.  Chaos game representation of proteins. , 1997, Journal of molecular graphics & modelling.

[38]  Bo Liao,et al.  A 2D graphical representation of DNA sequence , 2005 .

[39]  Jure Zupan,et al.  On representation of proteins by star-like graphs. , 2007, Journal of molecular graphics & modelling.

[40]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[41]  Milan Randic,et al.  A novel 2-D graphical representation of DNA sequences of low degeneracy , 2001 .

[42]  Milan Randic,et al.  On 3-D Graphical Representation of DNA Primary Sequences and Their Numerical Characterization , 2000, J. Chem. Inf. Comput. Sci..

[43]  Milan Randic,et al.  On A Four-Dimensional Representation of DNA Primary Sequences , 2003, J. Chem. Inf. Comput. Sci..

[44]  A. Nandy,et al.  A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[45]  Yu-hua Yao,et al.  Analysis of similarity/dissimilarity of DNA sequences based on a 3-D graphical representation , 2005 .

[46]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[47]  M Novic,et al.  Novel numerical and graphical representation of DNA sequences and proteins , 2006, SAR and QSAR in environmental research.

[48]  Tianming Wang,et al.  A novel 2D graphical representation of DNA sequences and its application. , 2006, Journal of molecular graphics & modelling.

[49]  Alexandru T. Balaban,et al.  Unique graphical representation of protein sequences based on nucleotide triplet codons , 2004 .

[50]  Renfa Li,et al.  Novel method for analyzing proteome , 2007 .

[51]  Milan Randić,et al.  2-D Graphical representation of proteins based on physico-chemical properties of amino acids , 2007 .

[52]  M. Randic,et al.  Highly compact 2D graphical representation of DNA sequences , 2004, SAR and QSAR in environmental research.