Similarity analysis for DNA sequences based on chaos game representation. Case study: the albumin.

Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.

[1]  Zu-Guo Yu,et al.  Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. , 2004, Journal of theoretical biology.

[2]  Chun Li,et al.  Directed graphs of DNA sequences and their numerical characterization. , 2006, Journal of theoretical biology.

[3]  A. Nandy,et al.  Novel techniques of graphical representation and analysis of DNA sequences—A review , 1998, Journal of Biosciences.

[4]  Milan Randić,et al.  Another look at the chaos-game representation of DNA , 2008 .

[5]  Michael Frame,et al.  Chaos Under Control: The Art and Science of Complexity , 1994 .

[6]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[7]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[8]  Krishna Kant,et al.  Bio-sequence Signatures Using Chaos Game Representation , 2009 .

[9]  M. Gates A simple way to look at DNA. , 1986, Journal of theoretical biology.

[10]  Jonas S. Almeida,et al.  Analysis of genomic sequences by Chaos Game Representation , 2001, Bioinform..

[11]  A. Giuliani,et al.  Review of nonlinear analysis of proteins through recurrence quantification , 2007, Cell Biochemistry and Biophysics.

[12]  Zu-Guo Yu,et al.  Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. , 2009 .

[13]  Dejan Plavšić,et al.  Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation , 2003 .

[14]  F. Tian,et al.  Bilateral similarity function: a novel and universal method for similarity analysis of biological sequences. , 2010, Journal of theoretical biology.

[15]  Bo Liao,et al.  Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation , 2004 .

[16]  Jure Zupan,et al.  On representation of proteins by star-like graphs. , 2007, Journal of molecular graphics & modelling.

[17]  Donald S Borrett,et al.  Chaos game representation of human pallidal spike trains , 2010, Journal of biological physics.

[18]  C. Cristescu,et al.  The dynamics of exchange rate time series and the chaos game , 2009 .

[19]  C. Munteanu,et al.  Generalized lattice graphs for 2D-visualization of biological information , 2009, Journal of Theoretical Biology.