Extracting structural and dynamical informations from wavelet-based analysis of DNA sequences

The packaging of the eucaryotic genomic DNA involves the wrapping around the histone proteins [1] followed by the successive foldings of higher order structured nucleoprotein complexes -[2J. The bending properties of DNA play an essential role in these compaction processes [3, 4]. This hierarchically organized pathway is likely to be reflected in the fractal bebavior of DNA bending signais in eucaryotic genomes, but the challenge is to somebow extract this structural information by a clever reading of the DNA sequences. We show that wben using an adapted mathematicaJ tool, ÙJe "wavelet transform microscope" [5, 6J, to explore the fluctuations of bending profiles, one reveals a cbaracteristic scale of 100-200bp that separates two differentregimes of Oong-range) power-Iaw correlations (PLC) that are co=on to eucaryotic as weil as eubacterial and archaeal genomes. The same analysis of the DNA text yields strikingly similar resLÙts to those obtained with bending profiles, and this for ail tbree kingdoms. In the small-scale regime'; PLC are observed in eucaryotic genomes, in nuclear replicating DNA viroses and in archaeal genomes, which contrasts with their total absence iD the genomes of eubacteria and their viroses, thus indicating that small-scale PLC are likely to be reJated to the mechanisms undedying the wrapping of DNA around histone proteins. Tbese results together with the observation of PLC belween particular sequence motifs known to participate in the formation of nucleosomes (e.g. AA dinucleotides) sbow that the 10 200 bp PLC provide a very efficient diagnostic of the nucleosomal structUre and this in coding as well as in noncoding regions (7, 8J. We discuss possible interpretations of these PLC in terms of the physicaJ mechanisms that might govem the positioning and dynarnics of the nucleosomes aJong the DNA cbain tbrough cooperative processes (8]. We further specula te that the large-scaJe PLC are the signature of the higher­ arder structure and dynarnics of chÎomatin. The availability of fully sequenced genomes offers the possibility to study the scale-invariance properties of DNA sequences on a wide range of scales extending frorn' tens to thousands of nucleotides. Actually, scale invariance measurement enables us to evidence particular correlation structures between distant nucleotides or groups of nucleotides. During the past few years, there has been intense discussion about the existence, the nature and the origin of long-range correlations in genomic sequences [9, 10, Il, 12]. If it is now well admitted that long-range correlations do exist in DNA sequences [6, Il, 'l3], their biological interpretation is still debated [9, 10, 11, 12, 13, 14, 15, 16, 17]. Most of the models proposed so far are based on the genome plasticity and are supported by the reported absence of power­ · 1aw correlations (PLC) in coding DNA sequences [5, 6, 13, 18]. In a previous work [17], · from a systematic analysis of human exons, CDS's and introns, we have found that PLC are not onJy present in non-coding sequences but also in coding regions sornehow hidden ·in their inner codon structure. Here we report the res~ts of a recent study [7, 8] that Present address: Ecole Normale Supérieure de Lyon, 46, aJJée d'Italie, F-69364 Lyon Cedex 07 France