Novel techniques of graphical representation and analysis of DNA sequences—A review

The advent of automated DNA sequencing techniques has led to an explosive growth in the number and length of DNAs sequenced frpm different organisms. While this has resulted in a large accumulation of data in the DNA databases, it has also called for the development of suitable techniques for rapid viewing and analysis of the data. Over the last few years several methods have been proposed that address these issues and represent a DNA sequence in a compact graphical form in one-, two- or three-dimensions that can be expanded as necessary to help visualize the patterns in gene sequences and aid in in-depth analysis. Graphical techniques have been found to be useful in highlighting local and global base dominances, to identify regions of extensive repetitive sequences, differentiate between coding and non-coding regions, and to be indicative of evolutionary divergences. Analysis with graphical methods have also provided insights into new structures in DNA sequences such as fractals and long range correlations, and some measures have been developed that help quantify the visual patterns.This review presents a comprehensive study of the graphical representation methods and their applications in viewing and analysing long DNA sequences and evaluates the merits of each of these from a practical viewpoint with prescriptions on domains of applicability of each method. A discussion on the comparative merits and demerits of the various methods and possible future developments have also been included.

[1]  A. Nandy Two-dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences , 1996, Comput. Appl. Biosci..

[2]  E. Kawasaki,et al.  Renaturation and Purification of Biologically Active Recombinant Human Macrophage Colony-Stimulating Factor Expressed in E. Coli , 1989, Bio/Technology.

[3]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[4]  I. V. Ischenko,et al.  SITEVIDEO: a computer system for functional site analysis and recognition. Investigation of the human splice sites , 1993, Comput. Appl. Biosci..

[5]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[6]  W. Fitch An improved method of testing for evolutionary homology. , 1966, Journal of molecular biology.

[7]  A. Nandy GRAPHICAL ANALYSIS OF DNA SEQUENCE STRUCTURE : III. INDICATIONS OF EVOLUTIONARY DISTINCTIONS AND CHARACTERISTICS OF INTRONS AND EXONS , 1996 .

[8]  Y Iida Splice-site signals of mRNA precursors as revealed by computer search. Site-specific mutagenesis and thalassemia. , 1985, Journal of biochemistry.

[9]  Rodger Staden,et al.  Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes , 1984, Nucleic Acids Res..

[10]  Y Iida,et al.  Categorical discriminant analysis of 3'-splice site signals of mRNA precursors in higher eukaryote genes. , 1988, Journal of theoretical biology.

[11]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[13]  C Dutta,et al.  Mathematical characterization of Chaos Game Representation. New algorithms for nucleotide sequence analysis. , 1992, Journal of molecular biology.

[14]  Eugene Hamori,et al.  HYLAS: program for generating H curves (abstract three-dimensional representations of long DNA sequences) , 1989, Comput. Appl. Biosci..

[15]  Eugene Hamori Visualization of biological information encoded in DNA , 1994 .

[16]  S Karlin,et al.  Patchiness and correlations in DNA sequences , 1993, Science.

[17]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[18]  Lov K. Grover,et al.  What is a computer? , 2005, Journal of Medical Systems.

[19]  E. Hamori,et al.  DNA sequence (H) curves of the human immunodeficiency virus 1 and some related viral genomes. , 1988, DNA.

[20]  M Karplus,et al.  Neural networks for protein structure prediction. , 1991, Methods in enzymology.

[21]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[22]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[23]  E Hamori Graphic representation of long DNA sequences by the method of H curves--current results and future aspects. , 1989, BioTechniques.

[24]  Didier Sornette Long-Range Correlations , 2000 .

[25]  Rodger Staden,et al.  Graphic methods to determine the function of nucleic acid sequences , 1984, Nucleic Acids Res..

[26]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[27]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[28]  R. Staden Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. , 1982, Nucleic acids research.

[29]  Y Iida,et al.  Recognition patterns for exon-intron junctions in higher organisms as revealed by a computer search. , 1983, Journal of biochemistry.

[30]  R Zhang,et al.  Analysis of distribution of bases in the coding sequences by a diagrammatic technique. , 1991, Nucleic acids research.

[31]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[32]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[33]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[34]  A. Nandy,et al.  A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[35]  A. Nandy,et al.  GRAPHICAL ANALYSIS OF DNA SEQUENCE STRUCTURE. II: RELATIVE ABUNDANCES OF NUCLEOTIDES IN DNAS, GENE EVOLUTION AND DUPLICATION , 1995 .

[36]  R. Staden Finding protein coding regions in genomic sequences. , 1990, Methods in enzymology.

[37]  V. V. Prabhu,et al.  Correlations in intronless DNA , 1992, Nature.

[38]  M. Gates A simple way to look at DNA. , 1986, Journal of theoretical biology.

[39]  Y. Lida,et al.  Analysis of context of 5'-splice site sequences in mammalian mRNA precursors by subclass method , 1992, Comput. Appl. Biosci..

[40]  Ruth Nussinov,et al.  Compositional variations in DNA sequences , 1991, Comput. Appl. Biosci..

[41]  J Xu,et al.  Fractal dimension of exon and intron sequences. , 1995, Journal of theoretical biology.

[42]  J. Ninio,et al.  Graphical coding of nucleic acid sequences. , 1985, Biochimie.

[43]  A Nandy Recent investigations into global characteristics of long DNA sequences. , 1994, Indian journal of biochemistry & biophysics.

[44]  B Sankaranarayanan,et al.  Chaos Game Representations of similarities and differences between genomic sequences , 1994 .

[45]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[46]  Y. Ohshima,et al.  Signals for the selection of a splice site in pre-mRNA. Computer analysis of splice junction sequences and like sequences. , 1987, Journal of molecular biology.

[47]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[48]  L. J. Korn,et al.  Computer analysis of nucleic acid regulatory sequences. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Gautam B. Singh,et al.  DNAView: a quality assessment tool for the visualization of large sequenced regions , 1995, Comput. Appl. Biosci..

[50]  S. Nee,et al.  Uncorrelated DNA walks , 1992, Nature.

[51]  R. LATHE,et al.  Machine-readable DNA sequences , 1984, Nature.

[52]  Marvin B. Shapiro,et al.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. , 1987, Nucleic acids research.

[53]  J. Maizel,et al.  Enhanced graphic matrix analysis of nucleic acid and protein sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[54]  EUGENE HAMORI,et al.  Novel DNA sequence representations , 1985, Nature.

[55]  M. Johnston Genome sequencing: The complete code for a eukaryotic cell , 1996, Current Biology.

[56]  Samir K. Brahmachari,et al.  Genome analysis: A new approach for visualization of sequence organization in genomes , 1992, Journal of Biosciences.

[57]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[58]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[59]  P. M. Leong,et al.  Random walk and gap plots of DNA sequences , 1995, Comput. Appl. Biosci..

[60]  Selden Ac All Greek to me. , 1984 .

[61]  John Maddox Ever-longer sequences in prospect , 1992, Nature.