Considering vast amounts of genomic sequences of mostly unknown functionality, in-silico prediction of functional regions is an important enterprise. Many genomic browsers employ GC content, which was observed to be elevated in gene-rich functional regions. This report shows that the entropy of di- and tri-nucleotides distributions provides a superior measure of genomic sequence functionality, and proposes an explanation on why the GC content must be elevated (closer to 50%) in functional regions. Regions with high entropy strongly co-localize with exons and provide genome-wide evidences of purifying selection acting on non-coding regions, such as decreased SNPs density. The observations suggest that functional non-coding regions are optimised for mutation load in a way, that transition mutations have less impact on functionality than transversions, leading to the decrease in transversions to transitions ratio in functional regions.
[1]
M. Olivier.
A haplotype map of the human genome
,
2003,
Nature.
[2]
Sang Joon Kim,et al.
A Mathematical Theory of Communication
,
2006
.
[3]
Ilya Prigogine,et al.
The Meaning of Entropy
,
1987
.
[4]
T. D. Schneider,et al.
Evolution of biological information.
,
2000,
Nucleic acids research.
[5]
Li,et al.
A haplotype map of the human genome The International HapMap Consortium
,
2005
.
[6]
D. Petrov,et al.
High intrinsic rate of DNA loss in Drosophila
,
1996,
Nature.
[7]
Genetic Variability of Splicing Sites
,
2006,
q-bio/0611060.
[8]
M. Olivier.
A haplotype map of the human genome.
,
2003,
Nature.