Prediction of Significant Cruciform Structures from Sequence in Topologically Constrained DNA - A Probabilistic Modelling Approach

Sequence-dependent secondary DNA structures, such as cruciform or triplex DNA, are implicated in regulation of gene transcription and other important biological processes at the molecular level. Sequences capable of forming these structures can readily be identified in entire genomes by appropriate searching techniques. However, not every DNA segment containing the proper sequence has equal probability of forming an alternative structure. Calculating the free energy of the potential structures provides an estimate of their stability in vivo, but there are other structural factors, both local and non-local, not taken into account by such simplistic approach. In is paper we present the procedure we currently use to identify potential cruciform structures in DNA sequences. The procedure relies on identification of palindromes (or inverted repeats) and their evaluation by a nucleic acid folding program (UNAFold). We further extended the procedure by adding a modelling step to filter the predicted cruciforms. The model takes into account superhelical density of the analyzed segments of DNA and calculates the probability of cruciforms forming at several locations of the analyzed DNA, based on the sequences in the stem and loop areas of the structures and competition among them.