Genomic sequence analysis is usually performed with the help of specialized software packages written for molecular biologists. The scope of such pre-programmed techniques is quite limited. Because DNA sequences contain a large amount of information, analysis of such sequences without underlying assumptions may provide additional insights. The present article proposes two new graphical representations as examples of such methods. The random walk plot is designed to show the base composition in a compact form, whereas the gap plot visualizes positional correlations. The random walk plot represents the DNA sequence as a curve, a random walk, in a plane. The four possible moves, left/right and up/down, are used to encode the four possible bases. Gap plots provide a tool to exhibit various features in a sequence. They visualize the periodic patterns within a sequence, both with regard to a single type of base or between two types of bases.
[1]
R. Doolittle.
Molecular evolution: computer analysis of protein and nucleic acid sequences.
,
1990,
Methods in enzymology.
[2]
J. C. Shepherd.
Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.
,
1981,
Proceedings of the National Academy of Sciences of the United States of America.
[3]
S. Karlin,et al.
Chance and statistical significance in protein and DNA sequence analysis.
,
1992,
Science.
[4]
M. A. GATES,et al.
Simpler DNA sequence representations
,
1985,
Nature.
[5]
D J Jolly,et al.
Isolation and characterization of a full-length expressible cDNA for human hypoxanthine phosphoribosyl transferase.
,
1983,
Proceedings of the National Academy of Sciences of the United States of America.