Nucleotide distribution and the recognition of coding regions in DNA sequences: an information theory approach.

A method for the recognition of coding-regions along DNA sequences is described. The method is based on the observation, made in several cases, that nucleotide distribution at the third position of the codon is more biased (less random) than that in the other two positions. It is suggested that since nucleotide distribution at the third position is only weakly influenced by the amino acid distribution in the coded protein, there must be some constraints at the DNA level which bias the nucleotide distribution at the third position. The distinction between DNA-level constraints and protein-level constraints is discussed in the frame of Information Theory, and the analysis of the Mitochondrial gene coding for subunit-1 of the yeast cytochrome oxidase is presented.