Symmetry observations in long nucleotide sequences: a commentary on the Discovery Note of Qi and Cuticchia

The relative quantities of bases in DNA were determined chemically many years before sequencing technologies permitted direct counting of bases. Apparently unaware of the rich literature on the topic, bioinformaticists are today rediscovering the 'wheels' of Chargaff, Wyatt and other biochemists. It follows from Chargaff's second parity rule (%A = %T, %G = %C for single stranded DNA) that the symmetries observed for the two pairs of complementary mononucleotide bases, should also apply to the eight pairs of complementary dinucleotide bases, the thirty-two pairs of complementary trinucleotide bases, etc. This was made explicit by Prabhu in 1993 in a study of complete genomes and long genome segments from a wide range of taxa, and was rediscovered by Qi and Cuticchia in 2001 in a study of complete genomes. It follows from Chargaff's GC-rule (%GC tends to be uniform and species specific) that, within a species, oligonucleotides of the same GC% will be at approximately equal quantities in single stranded DNA. Thus, for example, while quantities of CAT and ATG (reverse complements) will be closely correlated because of both of the above Chargaff rules, CAT and GTA (forward complements) will show some correlation only because of the latter rule. The need for complete genomic sequences in bioinformatic analyses may have been somewhat overplayed.

[1]  T. Yomo,et al.  Concordant evolution of coding and noncoding regions of DNA made possible by the universal rule of TA/CG deficiency-TG/CT excess. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[2]  D. Forsdyke The origin of species, revisited , 2001 .

[3]  C. Alff-Steinberger Codon usage in Homo sapiens: evidence for a coding pattern on the non-coding strand and evolutionary implications of dinucleotide discrimination. , 1987, Journal of theoretical biology.

[4]  J. Mortimer,et al.  Chargaff's legacy. , 2000, Gene.

[5]  G. R. Wyatt THE NUCLEIC ACIDS OF SOME INSECT VIRUSES , 1952, The Journal of general physiology.

[6]  V. Prabhu Symmetry observations in long nucleotide sequences. , 1993, Nucleic acids research.

[7]  R. Blake,et al.  Analysis of the codon bias in E. coli sequences. , 1984, Journal of biomolecular structure & dynamics.

[8]  E. Chargaff Structure and function of nucleic acids as cell constituents. , 1951, Federation proceedings.

[9]  D. Forsdyke,et al.  Relative roles of primary sequence and (G + C)% in determining the hierarchy of frequencies of complementary trinucleotide pairs in DNAs of different species , 1995, Journal of Molecular Evolution.

[10]  Donald R Forsdyke,et al.  Did Celera invent the internet? , 2001, The Lancet.

[11]  A. Jamie Cuticchia,et al.  Compositional symmetries in complete genomes , 2001, Bioinform..

[12]  D. Forsdyke,et al.  Deviations from Chargaff's second parity rule correlate with direction of transcription. , 1999, Journal of theoretical biology.

[13]  D. Forsdyke,et al.  Accounting units in DNA. , 1999, Journal of theoretical biology.